Add REPL and expressions.t
[clump.git] / report.tex
CommitLineData
e7b86bf0
MG
1\documentclass[a4paper,12pt]{report}
2\usepackage{color}
3\usepackage[cm]{fullpage}
4\usepackage{graphicx}
5\usepackage{hyperref}
6\usepackage{listings}
7\usepackage{pdfpages}
8
9\title{YULE: Yet Unnamed Lisp Evaluator}
10\author{Marius Gavrilescu}
11\date{May 2018}
12
13\begin{document}
14% title page does not need anchors
15\hypersetup{pageanchor=false}
16\maketitle
17\hypersetup{pageanchor=true}
18\lstset{language=Perl, frame=leftline, framexleftmargin=5pt,
19 showstringspaces=false, showspaces=false, breaklines=true,
20 postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space}}
21
22\chapter*{Abstract}
23Lisp machines are computers designed to run Lisp programs efficiently.
24They were first developed in the 1970s, and they have special hardware
25support for Lisp. This usually comprises a tagged architecture (each
26pointer has a ``type'' field), garbage collection implemented in
27hardware, and an instruction set that lends itself well to evaluating
28Lisp expressions.
29
30Between 1975 and 1980, Guy L. Steele and Gerald Jay Sussman wrote a
31series of memos on lambda calculus, the Lisp language and its
32implementation known collectively as the ``Lambda Papers''. One of
33these papers, ``Design of LISP-Based Processors (1979)'' describes how
34to implement a Lisp machine processor. If this processor is connected
35to a RAM that contains a program, it executes that program and places
36the result back into the RAM.
37
38The aim of YULE is to take the design from the 1979, implement it
39today using modern technology and techniques, and bundle it up in a
40form that is both easy to use and extend. By going through this
41modernisation process, future developers can easily integrate the Lisp
42processor into their own design, modifying it as they please.
43
44The end result of YULE is a small device that can be plugged into a
45USB port, and an associated computer program. Running the program
46presents a simplified Lisp listener. The user can type a Lisp
47expression into the listener, and the listener will respond with the
48value of the expression. Under the hood, the listener evaluates Lisp
49by sending the expression over USB to the small device and getting
50back the result.
51
52YULE comes with the design files for the hardware, and source code for
53the software. All of these are released under a free software license
54and published online, which makes it very easy (from both a practical
55and a legal standpoint) for future developers to use in their own
56projects and modify to fit their purposes better.
57\tableofcontents
58
59\chapter{Introduction}
60\section{Motivation}
61Lisp machine processors are quite unlike modern processors. Modern
62instruction sets are designed for imperative programming. Each
63instruction makes the processor execute some operation, such as
64loading a register or storing a value in memory. In contrast, a Lisp
65machine processor reads an expression from memory, determines its
66type, and recursively evaluates it. This operation lends itself to
67declarative programming, where the programmer declares an arbitrarly
68complicated expression and asks the computer to evaluate it.
69
70Another common feature of Lisp machine processors is the use of tagged
71pointers. Each memory location is a tagged union of two elements: the
72value itself, and a tag representing the type of that data. Tagged
73architectures are not common today, although some programs still
74use tagged pointers by exploiting pointer alignment (if every pointer
75is a multiple of 8, then the bottom 3 bits can be used to store a
76tag). These tags can be used to store an object's reference count (as
77in Objective-C on certain versions of iOS), or information about a
78data structure (as in certain implementations of kd-trees).
79
80The ``Design of LISP-Based Processors (1979)'' paper by Guy L. Steele
81and Gerald Jay Sussman \cite{lambda} presents the design of a
82relatively simple Lisp machine processor, which is made from two
83read-only memories, two buses, and several registers. If this
84processor is connected to a RAM containing a Lisp expression in a
85certain binary format, the processor will evaluate the expression and
86place the result back in memory.
87
88This processor features both a tagged architecture (the top 3 bits of
89each word in memory represent the value's type) and the declarative
90nature mentioned above. It is therefore interesting to study and to
91bring this processor into today's world by using technology such as
92hardware description languages (that allow us to describe a circuit
93formally) and FPGAs (that allow us to synthesise any circuit quickly
94and at low cost). By building a HDL implementation of the processor
95not only can it be used easily on its own (by synthesising it to a
96FPGA and connecting it to RAM), but a future circuit designer can take
97this processor and include it in their more complicated design.
98
99The processor presented in the original paper is a somewhat barebones
100implementation of Lisp. Arithmetic primitives are not available, nor
101are there any input or output routines. Due to time constraints the
102paper's writers have not included a equality primitive. Additionally
103the ATOM primitive has a bug that makes it consider NIL not atomic.
104All these deficiencies can be rectified given time and effort, but it
105is difficult for one to fix them given only a paper describing the
106processor. Having a HDL implementation of the processor with a proper
107test suite allows a future circuit designer to more easily make
108changes to the processor while using the test suite to prevent
109regressions and using a FPGA to quickly obtain a hardware version of
110the changed processor that can be manually tested and used in any
111desired application.
112
113\section{Challenges}
114While \cite{lambda} is part of the well-known Lambda Papers series and
115has been read by many over the years, there are no published
116implementations of the processor described in the paper. Therefore
117\cite{lambda} was the only source used in the building of this
118project, and several of the limitations of the processor were not easy
119to see before the implementation phase was well underway. Furthermore,
120the original paper has several shortcomings.
121
122For one, while the microcode (the values of the two read-only
123memories) is provided in both source and compiled form, the microcode
124compiler is not provided. This makes it difficult to alter the
125microcode to fix bugs or introduce new instructions. One needs to
126either work on the already-compiled code (hard and error-prone) or
127reimplement the microcode compiler, which takes a reasonable amount of
128effort.
129
130Moreover, the paper does not concern itself at all with how the user
131gets a program into memory or how the user gets the result back. This
132makes the processor as-described very difficult to use.
133
134To implement the processor in Verilog, no source other than the paper
135was used. There were no previous implementations of the processor that
136could be consulted to resolve unclarities in the original paper.
137
138In addition, there was no software available for interacting with the
139processor. A compiler and assembler had to be written from scratch,
140and so the processor could not be easily debugged until after the
141software was functional.
142
143\section{Contribution}
144YULE is a modern-day implementation of a Lisp machine processor using
145the ideas and techniques presented in \cite{lambda}.
146
147The main contribution of YULE is a Verilog implementation of the
148processor which allows future circuit designers to use it in their own
149design and modify it as they please.
150
151As the processor as described has no way to input a program or extract
152a result from memory, YULE includes more circuitry around the
153processor that lets it communicate with the external word. This
154consists of a serial UART and two modules, a reader and a writer. When
155the processor is idle, the reader waits for a program to be sent over
156the UART. Once a program has been received, it is placed into memory
157and the processor starts to evaluate it. Once evaluation is done, the
158writer transmits the new contents of the memory over the UART.
159
160Besides the hardware chip, YULE also includes three pieces of
161software:
162
163\begin{itemize}
164\item a compiler, which takes Lisp source code and produces an
165 intermediate representation of the program.
166\item an assembler, which takes a program in the intermediate
167 representation and produces the binary representation of the program
168 that the processor can understand.
169\item a REPL, in which users can type expressions. When an expression
170 is submitted, the REPL compiles it, assembles it, and sends it to
171 the processor. The processor evaluates the expression and sends back
172 a memory dump. The REPL then reads the memory dump, interprets it,
173 and prints the value of the expression in a human-readable format.
174\end{itemize}
175
176The REPL is therefore the entry point of YULE as a whole. An end-user
177who does not intend to extend YULE can plug in an iCEstick with the
178processor, and run the REPL. Then the user can write expressions in
179the REPL and get their values back. Besides being the entry point, the
180REPL serves as a convenient way to test that the entire system is
181working. An expression typed in the REPL goes through the compiler and
182assembler, and then is sent to the processor. It goes through the
183three stages of the processor (reading the expression into memory,
184evaluating the contents of the memory, writing a memory dump).
185
186The hardware and software components of YULE have been tested to
187ensure they function correctly. There is an automated test suite
188containing 48 unit tests for the compiler and assembler, and 21
189functional tests that engage all parts of the system (both the
190software and the hardware). The unit tests ensure that the various
191small parts of the compiler and assembler work properly, while the
192functional tests consist of pairs of short Lisp expressions and their
193correct values. The expressions are written to the REPL, and the
194REPL's output is compared to the correct values.
195
196To make it easier for future circuit designers and programmers to use
197this project, YULE has been released under an open source license.
198Prospective users can find all the source files (for both hardware and
199software) online, and are free to use, modify, and redistribute them
200under the same terms as Perl itself. The files can be accessed at this
201URL: \url{https://git.ieval.ro/?p=yule.git}.
202
203\section{Layout of this report}
204This report begins by describing the background of YULE, the
205technologies that made this project possible and a more in-depth
206description of the 1979 paper. Then the components that YULE comprises
207are examined in turn, followed by a chapter on how everything was
208tested, which includes a worked example of the system in action. The
209report ends with a list of ideas for future work based on YULE, and a
210section on reflections.
211
212\chapter{Background}
213
214\section{Field-programmable gate array}
215An FPGA is a configurable integrated circuit. A designer can define a
216circuit using a hardware description language (explained in the next
217section) and compile it to some binary format called a configuration.
218This configuration describes how the different components of the FPGA
219should be connected to each other. The configuration can then be
220uploaded to the FPGA, and the FPGA will connect its internal
221components as requested.
222
223An FPGA can therefore replace any digital custom circuit (ASIC). While
224an FPGA consumes more energy, uses more space, and runs at slower
225clock rates than an ASIC, the FPGA permits rapid prototyping: the time
226between finishing to write a design and having a working part is mere
227seconds (compilation time + uploading the configuration to the FPGA),
228and if a bug is discovered the buggy part need not be thrown away. The
229design can be modified and the fixed configuration can be uploaded to
230the FPGA.
231
232The hardware aspect of YULE is designed to run on the Lattice
233iCE40HX-1k FPGA, but with small modifications it can run on other
234FPGAs as well.
235
236\subsection*{iCEstick}
237\begin{figure}
238 \centering
239 \includegraphics[height=4cm]{icestick.png}
240 \caption{Lattice iCEstick Evaluation Kit}
241\end{figure}
242
243The Lattice iCEstick Evaluation Kit is a development board for the
244iCE40HX-1k FPGA. It contains the FPGA, a USB connector, a FTDI chip
245and 5 LEDs. The form factor is very convenient. It is similar to a USB
246memory stick (albeit bigger) that can be plugged into a computer,
247avoiding the need for jumper wires and GPIO that traditional
248development boards require.
249
250The FTDI chip implements the USB protocol, allowing the FPGA to be
251programmed by the computer and letting the FPGA talk to the computer
252using a very simple serial UART interface on the FPGA side, and a
253standard USB serial port on the computer side. This makes
254communication very easy. The FPGA can use any off-the-shelf UART
255implementation (this project uses \texttt{osdvu} from
256Opencores \cite{osdvu}), and the computer can use standard serial
257console tools such as \texttt{minicom}.
258
259\section{Verilog}
260Verilog is a hardware description language. It is used to model
261digital circuits at register-transfer level of abstraction. In Verilog
262one defines several modules, with one of them (the ``toplevel''
263module) representing the entire design. Each module has several inputs
264and outputs, can contain internal wires, can contain instances of
265other modules, and can contain combinational and sequential logic
266blocks that drive internal wires and outputs.
267
268Verilog has many more complicated features that are not used in YULE.
269These include tristate buffers, delays, functions, tasks, loops, and
270strength declarations.
271
272A subset of Verilog can be \textit{synthesised}; that is converted to
273a netlist (a list of transistors or FPGA components used and
274connections between them) that can then be used to build an ASIC
275(custom integrated circuit), or can be further converted to the
276internal representation of the FPGA and uploaded to the FPGA. The FPGA
277workflow and the programs that are part of it are described in the
278next section.
279
280A verilog project can also be run through a simulator, which simulates
281the design clock edge by clock edge and records the values on each
282wire in the design. This record can later be read in and visualised
283with a wave viewer, which allows a circuit designer to better
284understand why the circuit is not working correctly. One such program,
285GTKWave \cite{gtkwave}, was used to debug YULE.
286
287\section{icestorm}
288Project IceStorm \cite{icestorm} is a set of open source reverse
289engineered tools for programming Lattice iCE40 FPGAs. There exists a
290fully open source Verilog-to-Bitstream flow which comprises:
291
292\begin{enumerate}
293\item Yosys, a Verilog synthetizer.
294\item Arachne-pnr, a tool that implements placing and routing for
295 iCE40 FPGAs.
296\item icepack, which converts Arachne-pnr's output to iCE40 bitstream.
297\item iceprog, which programs the iCEstick with the output of icepack.
298\end{enumerate}
299
300\section{Design of LISP-Based Processors (1979)}
301Design of LISP-Based Processors (1979) \cite{lambda} is one of the
302``Lambda papers''; a series of memos by Sussman and Steele about the
303Lisp language and its implementation. The paper describes SIMPLE, a
304microprocessor designed to run Lisp.
305
306This processor is made of two read-only memories called EVAL and GC.
307Each read-only memory is paired with state registers to form a state
308machine. The contents of the state registers are fed into the
309read-only memory, and the output of the read-only memory determines
310the next state and other signals.
311
312Each state machine also has an internal bus (called the E and G buses
313respectively) and several registers connected to that bus. Each of
314these registers stores a tagged pointer, which is an 11-bit value (the
315top 3 bits represent the type, and the bottom 8 bytes represent the
316value).
317
318The signals coming out of each state machine can cause a register to
319be read onto the machine's internal bus, a register to be loaded from
320the internal bus, or write a constant value onto the internal bus.
321
322The GC state machine represents the processor's memory system. It is
323connected to an external RAM, which it controls through several
324signals. It knows how to execute certain ``GC operations'', which
325represent common Lisp memory management actions. Examples of these are
326\texttt{CAR} and \texttt{CDR} (``given a pointer to a cons cell,
327return either the CAR or the CDR of that cons cell'') or \texttt{CONS}
328(``given two pointers, allocate a new cons cell, copy them into it,
329and return a pointer to the new cons cell''). The existence of the GC
330isolates the rest of the processor from the particularities of the
331memory used.
332
333The EVAL state machine evaluates Lisp expressions. Signals coming out
334of it control the GC by selecting what operation the GC should perform
335and connecting the E and G buses to supply arguments to the GC and
336read results back. The EVAL reads expressions from memory via the GC
337and evaluates them, putting results back into memory.
338
339The overall operation of the processor is to read an expression from
340the magic location ``5'' in memory, evaluate it, and place the result
341into the magic location ``6'' in memory. Other memory locations are
342used to store subexpressions of the input expression, and to store any
343intermediate results. Despite its name, GC does not perform any
344garbage collection, so at the end of the processor's operation the
345memory will still contain all intermediate results produced during
346evaluation.
347
348The processor is intended as a proof of concept and not a
349production-ready hardware implementation of Lisp. Therefore it has
350several important limitations, for example there is no primitive
351operation that implements equality of objects (making it impossible to
352distinguish between symbols), there are no primitive arithmetic
353operations (so one needs to use Church numerals if numbers are
354needed), and the processor includes a known bug (``the ATOM bug'').
355The bug is that the ATOM primitive considers NIL to be non-atomic,
356which is incorrect.
357
358The Lisp expressions mentioned here are in textual form (Lisp source
359code), but instead in a special tagged pointer format. Each memory
360value is a union of two values: a 8-bit pointer and a 3-bit type. We
361can use pairs of consecutive memory locations to represent cons cells
362and build linked lists out of them. If a list is at location X, that
363means that (X, X+1) is the first cons cell of the list, with X+1 being
364the CAR (the value of the first element) and X being the CDR (a
365pointer to the next cons cell). A discussion of the different ways to
366represent Lisp expressions (source code, this tagged pointer format,
367and an intermediate representation) can be found in \ref{sec:formats}.
368
369The next page shows a diagram of how the processor in the paper is
370structured. The two read-only memories can be seen, with their
371registers, state registers, and the interconnection between the E and
372G buses.
373
374\begin{figure}
375\centering\includegraphics[width=18.5cm]{AIM-514-design.pdf}
376\caption{Design diagram of processor from \cite{lambda}}
377\end{figure}
378
379\chapter{A Lisp Machine processor on a USB drive}
380The project comprises a hardware part written in Verilog and a
381software part written in Perl.
382
383The hardware part is a Lisp machine processor with associated
384circuitry that can read a program from the iCEstick UART and that can
385write the result of running a program to the UART. The normal
386operation of this part is to:
387
388\begin{enumerate}
389\item Wait for a program to be supplied via UART
390\item Run the program on the processor
391\item Write a memory dump to the UART
392\item Go back to step 1
393\end{enumerate}
394
395The software part includes:
396
397\begin{itemize}
398\item A compiler, which takes Lisp source code and puts it in the
399 linked list format understood by the SIMPLE processor
400\item An assembler, which takes the result of the compiler and
401 converts it to the SIMPLE in-memory binary format.
402\item A REPL, which repeatedly reads a program from the user, compiles
403 it, assembles it, sends it to YULE, reads the memory dump from YULE,
404 and prints the result for the user to see.
405\end{itemize}
406
407\clearpage
408\section{Hardware}
409The hardware part is built from 8 modules:
410
411\begin{enumerate}
412\item The \textbf{READER}, which implements step 1 of the normal operation described above.
413\item The \textbf{WRITER}, which implements step 3 of the normal operation described above.
414\item The \textbf{GCRAM}, a single-port RAM.
415\item The \textbf{GC}, corresponding to the GC PLA in the original implementation.
416\item The \textbf{EVAL}, corresponding to the EVAL PLA in the original implementation.
417\item The \textbf{CTRL}, which advances the operation from one step to the next one.
418\item The \textbf{UART}, which implements transmission and reception of bytes via UART.
419\item The \textbf{CPU}, which instantiates every module above and wires them together.
420\end{enumerate}
421
422\subsection{Clocks}
423To avoid timing problems, a single 48 MHz clock is used. The iCEstick
424has a 12 MHz oscillator, and the phase-locked loop circuitry on the
425FPGA multiplies this signal by 4.
426
427Every flip-flop updates on the positive edge of the clock, except for
428the RAM which updates on the negative edge. This way the rest of the
429circuit perceives the RAM to be combinational (it ``updates
430immediately''): if on a positive edge the RAM's address input changes,
431the RAM will update its output on the following negative edge and the
432changed output will be visible on the next positive edge. If the RAM
433updated on a positive edge, then if the input changed on positive edge
434$i$ the changed output will only be visible on positive edge $i+2$.
435
436The design is intended to be run on a FPGA, which has a fixed clock
437tree. This typically consists of a small number of special nets called
438global buffers that can distribute signals with minimal skew to every
439logic cell. The FPGA on the iCEstick has 8 such global buffers.
440
441
442There are 4 components in the design (GC, EVAL, READER, WRITER) that
443need to be enabled/disabled at different times. One way to achieve
444this is to AND the master 48 MHz clock with several signals, thus
445creating a separate clock for each component. This is known as
446\textit{clock gating}. However, on a FPGA clock gating involves
447routing the clock from a global buffer to a logic cell and then back
448to another global buffer. This can introduce glitches in the clock and
449would delay each of the clocks by some amount (dependent on the
450signals it is ANDed with). It is therefore possible that the 4 output
451clocks would not be aligned, causing problems when data is
452communicated by one component to another. As such, clock gating is not
453recommended on FPGAs.
454
455\begin{figure}
456 \centering\includegraphics[height=3cm]{gating.eps}
457 \caption{Gated clock on the left, clock enable on the right}
458\end{figure}
459
460To avoid this problem, the design used here does not use clock gating
461and instead uses 4 clock enable signals supplied by the controller to
462each of the GC, EVAL, READER and WRITER. At any point the clock is
463enabled for exactly one of the READER, GC, and WRITER. The EVAL clock
464is enabled if and only if the GC clock is enabled and the GC's
465\texttt{step\_eval} output is high.
466
467\clearpage
468\subsection{GC}
469\begin{figure}
470 \centering\includegraphics[height=4.88cm]{gc.eps}
471 \caption{GC module}
472\end{figure}
473
474The GC module is connected to the GCRAM (via the \texttt{ram\_do,
475 ram\_we, ram\_addr, ram\_di} signals), to the EVAL (via the
476\texttt{Ein, Eout, gcop, step\_eval, conn\_et, conn\_ea} signals) and
477to the controller. It also provides the value of its P register (the
478free storage pointer) to the WRITER, so the writer can know when to
479stop dumping the memory.
480
481It is implemented as a combinational ROM, some combinational logic to
482drive the internal G bus, sequential logic to advance to the next
483state respecting dispatch rules, and sequential logic to update
484registers. The clock enable signal acts as an active low synchronous
485reset: when the clock is disabled, the state register is set to the
486initial state.
487
488This module represents one half of the original design presented in
489\cite{lambda}, and the ROM's contents are those from the paper with a
490bugfi (the original design sets the ADR line low in two states when it
491should be high).
492
493\subsection{EVAL}
494\begin{figure}
495 \centering\includegraphics[height=3.29cm]{eval.eps}
496 \caption{EVAL module}
497\end{figure}
498
499The EVAL module is connected to the GC (via the \texttt{conn\_et,
500 conn\_ea, Ein, Eout, gcop} signals) and to the controller.
501
502The implementation of the EVAL is very similar to that of the GC.
503There is a combinational ROM, combinational logic to drive the E bus,
504sequential logic to advance to the next state respecting dispatch
505rules, and sequential logic to update registers. A synchronous reset
506is available. When the reset line is held high for one clock cycle,
507the state register is set to the initial state. It is necessary to
508separate the reset from the clock enable because the GC often pauses
509the EVAL to do work and the EVAL should not be reset every time this
510happens.
511
512This module represents the other half of the original design presented
513in \cite{lambda}, and the ROM's contents are the same as those in the
514paper except that the ROM is widened by one bit to indicate whether
515the EVAL is finished evaluating the expression.
516
517\clearpage
518\subsection{Writer}
519\begin{figure}
520\centering\includegraphics[height=2.26cm]{writer.eps}
521\caption{WRITER module}
522\end{figure}
523
524The WRITER module is connected to the UART (via the \texttt{tx\_busy,
525 tx\_signal, tx\_byte} signals), the GC (via the \texttt{P} signal),
526the RAM (via the \texttt{ram\_addr, ram\_do} signals), and to the
527controller.
528
529Its purpose is to dump the memory to the UART starting with position 4
530(since positions 0, 1, 2, 3 are constant) and finishing with position
531$P - 1$, the last used cell in memory. It is implemented as a state
532machine with 7 states and an extra register (\texttt{current\_index})
533that stores the current index in the memory.
534
535The memory is 16-bit wide, but the UART sends and receives bytes. The
536writer dumps memory word in network byte order (big-endian), meaning
537the significant byte is sent before the least significant byte.
538
539The states are, in order:
540\begin{description}
541\item[START] which initializes \texttt{current\_index} to 4
542\item[WRITE1\_WAIT] which waits for \texttt{tx\_busy} to become low
543\item[WRITE1] in which transmission of the upper byte starts
544\item[WRITE2\_WAIT] which waits for \texttt{tx\_busy} to become low
545\item[WRITE2] in which transmission of the lower byte starts
546\item[INCREMENT] in which \texttt{current\_index} is incremented,
547 jumping back to WRITE1\_WAIT if it is still lower than
548 \texttt{freeptr}.
549\item[FINISHED] in which \texttt{finished} is driven high
550\end{description}
551
552This module is not part of the original design in \cite{lambda}, as
553that paper did not concern itself with input and output.
554
555\clearpage
556\subsection{Reader}
557\begin{figure}
558\centering\includegraphics[height=2.22cm]{reader.eps}
559\caption{READER module}
560\end{figure}
561
562The READER module is connected to the UART (via the
563\texttt{rx\_signal, rx\_byte} signals), the RAM (via the
564\texttt{ram\_we, ram\_addr, ram\_di} signals), and to the controller.
565
566Its purpose is to read a program from the UART and overwrite the GCRAM
567with this program. It is implemented as a state machine with two extra
568registers (\texttt{total\_words} and \texttt{current\_index}) to store
569the size of the incoming program and the current position within it.
570
571The reader expects to receive a 13-bit \texttt{length} value, followed
572by \texttt{length} 16-bit values that represent the desired memory
573contents. Each of these values is sent in network byte order
574(big-endian), meaning the most significant byte is sent before the
575least significant byte.
576
577The states are, in order:
578\begin{description}
579\item[IDLE] which waits for \texttt{rx\_signal}, puts the read byte at the top of \texttt{total\_words}.
580\item[LENGTH] which waits for \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{total\_words}
581\item[READ1] which waits for the \texttt{rx\_signal}, puts the read byte at the top of \texttt{ram\_di} and increments \texttt{current\_index}
582\item[READ2] which waits for the \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{ram\_di}
583\item[WRITE] in which \texttt{ram\_we} is high, jumping back to READ1 if \texttt{current\_index + 1 < total\_words}
584\item[FINISHED] in which \texttt{finished} is high
585\end{description}
586
587This module is not part of the original design in \cite{lambda}, as
588that paper did not concern itself with input and output.
589
590\clearpage
591\subsection{Putting the hardware together}
592The components described above are managed by the controller, which
593provides them with clock enable signals. The controller is a state
594machine with three states:
595\begin{description}
596\item[STATE\_READ] in which only the READER clock is enabled
597\item[STATE\_RUN] in which the GC and EVAL clocks are enabled
598\item[STATE\_WRITE] in which only the WRITER clock is enabled
599\end{description}
600
601In STATE\_READ and STATE\_WRITE, the EVAL's reset line is high.
602
603Whenver the component corresponding to the current state indicates it
604is finished, the controller advances to the next state.
605
606The controller also includes three multiplexers that control which
607lines are connected to the GCRAM. If the READER is active, all three
608GCRAM lines are connected to the READER. If the WRITER is active, the
609address line is connected to the writer (and the other lines are
610connected to the GC). If the GC is active, all lines are connected to
611the GC.
612
613The controller also includes logic to drive the 5 LEDs on the
614iCEstick. Three LEDs indicate the current state, while the other two
615indicate if the UART is currently transmitting or receiving.
616
617\section{Representing a Lisp program}
618\label{sec:formats}
619The processor described above evaluates a Lisp program received via
620UART. This section talks about three representations of such a
621program: the Lisp source code (a S-expression), the intermediate form
622(also a S-expression), and the in-memory form (a binary string).
623
624The first representation is obtained by parsing the user input, the
625compiler converts this into the second, and the assembler converts the
626second into the third.
627
628\subsection{Lisp source code}
629Recall that a S-expression is either an atom or a cons cell.
630
631An atom is a sequence of letters, digits, and certain symbols. There
632are two special atoms named \texttt{T} and \texttt{NIL} representing
633the boolean true and false respectively.
634
635A cons cell is a pair of two S-expressions called the ``car'' and the
636``cdr''. We write them as \texttt{(car . cdr)}, for example \texttt{(5
637 . 7)} is a cons cell with car \texttt{5} and cdr \texttt{7}.
638
639We can use cons cells to represent linked lists. A linked list is a
640sequence of cons cells where the cdr of one cell is the next cell, and
641the cdr of the last cell is \texttt{NIL}. For example the list
642\texttt{1,2,3} can be written as \texttt{(1 . (2 . (3 . NIL)))}.
643We can also write this list more concisely as \texttt{(1 2 3)}, and we
644write \texttt{()} to mean the atom \texttt{NIL}.
645
646Using these syntax elements we can encode several types of logical
647expressions:
648
649\begin{itemize}
650\item Numbers are encoded as numeric atoms, e.g. \texttt{5}
651\item Variables are encoded as non-numeric atoms, e.g. \texttt{my-var}
652\item Conditional statements are encoded as 4-element lists, e.g.
653 \texttt{(if is-odd 5 6)} which has value 5 if the variable
654 \texttt{is-odd} is not \texttt{NIL} and value 6 otherwise
655\item Symbols are encoded as non-numeric atoms preceded by a single
656 quote, e.g. \texttt{'my-symbol}. This can also be written
657 \texttt{(quote my-symbol)}
658\item Constant lists are encoded as lists preceded by a single quote,
659 e.g. \texttt{'(1 2 (3 4 5))}. This can also be written
660 \texttt{(quote (1 2 (3 4 5)))}
661\item Functions (lambda abstractions) are encoded as 4-element lists,
662 e.g. \texttt{(lambda first (a b) a)} in which the second element is
663 the function name, the third element is a list of function
664 arguments, and the fourth element is the function body.
665\item Function calls are encoded as lists, with the function as the
666 first element and the arguments as the other elements, e.g.
667 \texttt{(atom 5)} calls the function \texttt{atom} with argument
668 \texttt{5}.
669\end{itemize}
670
671\subsection{In-memory representation}
672The memory is an array of 16-bit integer values. We interpret these as
673tagged pointers: the top 3 bits indicate the ``type'' of this pointer,
674while the other 13 bits represent another memory location. A pointer
675therefore has a type and a value.
676
677We represent each Lisp expression as a tagged pointer. To evaluate it,
678the processor looks at the type of the pointer. Here is what a pointer
679of value $v$ means based on its type. We use the syntax $mem[a]$ to
680mean ``the contents of the memory location $a$'' (that is, a tagged
681pointer) and $memval[a]$ to mean ``the value at the memory location
682$a$'' (that is, the value of the tagged pointer $mem[a]$).
683
684\begin{enumerate}
685\setcounter{enumi}{-1}
686\item is a constant list, whose cdr is $mem[v]$ and car is $mem[v+1]$.
687\item is a constant symbol.
688\item is a variable reference, where $v$ indicates the position in the
689 environment of the referenced variable.
690\item is a constant closure, that is the result of evaluating a lambda
691 abstraction. Values of this type do not normally appear in the input
692 code, but instead are created during evaluation of other forms.
693\item is a procedure (lambda abstraction), whose body is $mem[v]$.
694\item is a conditional, where $mem[v+1]$ is the condition,
695 $mem[memval[v]+1]$ is the ``then'' branch, and $mem[memval[v]]$ is the
696 ``else'' branch.
697\item is a procedure call, which points to a special kind of linked
698 list. The CDR of each cell (other than the last one) points to the
699 next cell and has type 0, while the CDR of the last cell has value 0
700 and its type indicates the primitive to be called. The CARs of these
701 cells indicate the arguments for the procedure, in reverse order.
702
703 To call a user-defined procedure, we use the primitive
704 \texttt{funcall} and put the user-defined procedure as the CAR of
705 the last cell (in other words, the user-defined procedure is the
706 first argument to \texttt{funcall}).
707\item is a quoted constant, representing the object at $mem[v]$.
708\end{enumerate}
709
710The booleans \texttt{NIL} and \texttt{T} are compiled to a list
711pointing to 0, and a symbol with value 2 respectively.
712
713The memory has a peculiar fixed layout. Locations 0 and 1 are the CDR
714and CAR of NIL, locations 2 and 3 are the CDR and CAR of T, location 4
715holds the first unused location in memory (this tells the processor
716where it can start to place intermediate expressions), location 5
717holds the expression to be evaluated, and location 6 is where the
718processor will place the value of the expression.
719
720\subsection{The intermediate form}
721To facilitate conversion from the former format to the latter, an
722intermediate form is used. Here we represent tagged pointers as lists
723of one of three types:
724
725\begin{itemize}
726\item \texttt{(type value)}, where ``type'' is the pointer's type and
727 ``value'' is the pointer's (numeric) value.
728\item \texttt{(type ptr)}, where the pointer's type is ``type'' and
729 the pointer's value is $x$, where $x$ is the memory location where
730 ``ptr'' (another tagged pointer) lies.
731\item \texttt{(type ptr1 ptr2)}, where the pointer's type is ``type''
732 and the pointer's value is $x$, where $x$ is the memory location
733 where ``ptr1'' lies and $x+1$ is the memory location where ``ptr2''
734 lies.
735\end{itemize}
736
737These three representations map nicely to the tagged pointers
738described in the previous section. The first representation is used
739for constant symbols and variable references, the second for
740procedures, and the third for constant lists, conditionals, and
741procedure calls.
742
743A visual representation of the intermediate representation (from the
744AIM-514 paper) can be seen on the next page. The expression in
745question is
746
747\begin{lstlisting}
748((lambda append (x y) (if (atom x) y (cons (car x) (append (cdr x) y)))) '(a b c) '(d e f))
749\end{lstlisting}
750
751which we would represent as this sexp:
752
753\begin{lstlisting}
754(CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3))) (VAR -2)) (CALL (ATOM 0) (VAR -3))))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
755\end{lstlisting}
756
757As mentioned in the Background chapter, the processor has a bug (the
758ATOM bug) that causes \texttt{(ATOM NIL)} to be true. As a result,
759\texttt{(atom x)} in this expression is always true, even when
760\texttt{x} is the empty list, and the expression loops forever (or at
761least until memory is exhausted). We can fix this by changing the
762condition from \texttt{(atom x)} to \texttt{x}, which is false when
763\texttt{x} is NIL and true otherwise, and swapping the two branches of
764the conditional. The fixed program is:
765
766\begin{lstlisting}
767((lambda append (x y) (if x (cons (car x) (append (cdr x) y)) y)) '(a b c) '(d e f))
768\end{lstlisting}
769
770which is compiled to the sligthly shorter sexp:
771
772\begin{lstlisting}
773(CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (VAR -2) (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3)))) (VAR -3)))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
774\end{lstlisting}
775
776
777\begin{figure}
778\centering\includegraphics[width=18.5cm]{AIM-514-ir.pdf}
779\caption{Example of intermediate representation}
780\end{figure}
781
782\section{Software}
783\subsection{Compiler}
784The compiler takes in Lisp code and translates it to a sexp-based
785format understood by the assembler.
786
787The compiler has several functions for compiling certain kinds of
788expressions, and a function (\texttt{process\_toplevel}) that uses the
789former to compile arbitrary expressions.
790
791The output from any of this functions is a sexp in the intermediate
792form described in the previous section.
793
794Symbols are identified by sequences of letters on the source code side
795and by numbers on the processor side. The compiler therefore keeps a
796mapping between symbol names and their numbers. Whenever a new symbol
797is encountered, a new entry is added to this map.
798
799The \texttt{process\_quoted} function handles constants. For example
800if the source code contains \texttt{'(1 2 3)} then
801\texttt{process\_quoted} is called with \texttt{(1 2 3)}. This
802function handles the symbol interning described in the previous
803paragraph, and converts its input to intermediate form.
804
805The \texttt{process\_proc} function handles lambda abstractions. For
806example if the source code contains \texttt{(lambda first (a b) a)}
807then \texttt{process\_proc} is called with \texttt{first},
808\texttt{(a b)}, and \texttt{a}. This function appends the function
809name and arguments to the environment, then processes the body (using
810\texttt{process\_toplevel}) with the new environment.
811
812The \texttt{process\_funcall} function handles function calls. For
813example if the source code contains \texttt{(reverse-list 2 1)} then
814\texttt{process\_funcall} is called with \texttt{reverse-list}, and
815\texttt{(2 1)}. The arguments are first processed (using
816\texttt{process\_toplevel}), then (if the function is not a primitive)
817the function is processed as well.
818
819Finally the \texttt{process\_toplevel} handles any expression by
820figuring out what sort of expression it is and then processing it
821appropriately. Empty lists and \texttt{NIL} become \texttt{(list 0)},
822numbers and \texttt{T} are processed with \texttt{process\_quoted}
823
824Examples of compiler operation can be found in the compiler tests.
825Here are a few of them:
826
827\begin{itemize}
828\item The Lisp expression \texttt{'()} is compiled to \texttt{(LIST 0)}
829\item The Lisp expression \texttt{'(5 foo)} is compiled to
830 \texttt{(LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4))}. Observe that
831 the CDR comes before the CAR, and symbols are interned:
832 \texttt{'foo} is compiled to \texttt{(SYMBOL 3)} and \texttt{'5} is
833 compiled to \texttt{(SYMBOL 4)}
834\item The Lisp expression \texttt{(if t '(2 3) 'x)} is compiled to
835 \texttt{(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0) (SYMBOL 3))
836 (SYMBOL 4))) (SYMBOL 2))}. Observe that \texttt{t} is always
837 compiled to \texttt{(SYMBOL 2)}, as mentioned in the section about
838 the memory format.
839\end{itemize}
840
841\subsection{Assembler}
842The assembler takes in an S-expression in the intermediate form, lays
843out the words in memory, and produces a memory dump that can be
844supplied to the processor.
845
846The core of the assembler is a function called \texttt{process} that
847takes an expression in intermediate form and an optional location in
848memory. This function lays out the expression's subexpressions in
849memory and then lays out the expression itself at the given location
850(or at the end of the memory if there is no location given).
851
852The assembler begins by initializing the first 7 elements of the
853memory with the values needed by the processor. After initialization,
854the \texttt{process} function is called on the input expression with
855no location.
856
857Here is how the \texttt{process} function lays out the expression
858$expr$ at location $l$ (where $l$ can be a given memory location or
859simply ``the first free location in memory''):
860
861\begin{itemize}
862\item If $expr$ is \texttt{(type value)}, then it is directly laid out
863 at $l$
864\item If $expr$ is \texttt{(type ptr)}, then \texttt{process} is
865 called recursively to lay \texttt{ptr} out at the first free
866 location in memory (say this is location $loc$). Then \texttt{(type
867 $loc$)} is laid out at $l$.
868\item If $expr$ is \texttt{(type ptr1 ptr2)}, then two consecutive
869 free locations in memory $loc$ and $loc+1$ are reserved,
870 \texttt{process} is recursively called twice to lay \texttt{ptr1} at
871 $loc$ and \texttt{ptr2} at $loc+1$, and finally
872 \texttt{(type $loc$)} is laid out at $l$.
873\end{itemize}
874
875Once again, examples of assembler operations can be found in the
876assembler tests. Two of them:
877
878\begin{itemize}
879\item \texttt{(quoted (symbol 5))} is assembled to \texttt{00 08 00 00
880 00 00 20 00 20 00 00 08 E0 07 00 00 20 05}. Observe that each word
881 is made of two bytes, the first word (\texttt{00 08}) indicating the
882 length (in words) of the program. The first 4 words are always
883 fixed: they represent in order the CDR of NIL, the CAR of NIL, the
884 CDR of T, and the CAR of T. The 7th word is also a constant, 0,
885 because that is where the processor puts the result. The 5th word is
886 the first free word in memory (which is equal to the length in words
887 of the program), and the 6th word (\texttt{E0 07}) is the expression
888 to be evaluated. In this case, it has type quote and points to
889 memory location 7 (the 8th word, because pointers are 0-indexed). At
890 memory location 7 we have the value \texttt{20 05} which has type
891 symbol and value 5.
892\item \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} is
893 assembled to \texttt{00 0C 00 00 00 00 20 00 20 00 00 0C C0 07 00 00
894 00 09 20 05 E0 00 80 0B 5F FE}. Here we want to evaulate the
895 expression \texttt{C0 07} (type ``function call'', address 7). This
896 points to a cons-cell, whose car (at position 8) is \texttt{20 05}
897 (that is, \texttt{(number 5)}) and whose cdr (at position 7) is
898 \texttt{00 09} (type ``more'' and address 9). The ``more'' points to
899 another cons-cell, whose cdr (at position 9) is \texttt{E0 00} (that
900 is, \texttt{(funcall 0)}), and whose car is \texttt{80 0B}: a
901 procedure whose body is at position 11, namely \texttt{5F FE} (that
902 is, \texttt{(var -2)}).
903
904 So the expression is a call to the primitive \texttt{funcall} with
905 arguments \texttt{(lambda l (x) x)} and \texttt{5}. The original
906 program was probably \texttt{((lambda l (x) x) 5)}. Note that this
907 program was compiled with an old version of the compiler that did
908 not intern symbols, which explains why \texttt{5} was compiled to
909 \texttt{(number 5)} and not \texttt{(symbol 3)}.
910\end{itemize}
911
912\subsection{REPL}
913The REPL ties everything together: it reads a program from the user,
914compiles it, assembles it, sends the assembled program to the
915processor, reads the memory dump from the processor, and finally
916prints the result to the screen.
917
918The REPL includes an S-expression pretty-printer that operates on
919memory dumps. As symbols are interned by the compiler, when the
920pretty-printer hits a symbol it only knows its index, and not the
921original name. To print the original name the pretty-printer searches
922the symbol map built by the compiler and pulls the name from there.
923
924The pretty-printer takes a chunk of memory and index, and prints the
925expression at that index. For example, if the memory chunk is
926\texttt{00 00 00 00 20 00 20 00 00 09 00 07 00 07 20 02 00 00 60 00}
927and the index is 6, then the pretty printer will try to print
928\texttt{00 07}, which is a list with car \texttt{00 00} and cdr
929\texttt{20 02}, and the pretty printer prints \texttt{(nil . t)}.
930
931\clearpage
932An example REPL session is:
933
934\begin{lstlisting}
935YULE REPL
936
937* (cons 1 2)
938(1 . 2)
939* (car '(4 5 6))
9404
941* (cdr '(t t t))
942(T T)
943* ((lambda id (x) x) 5)
9445
945* (lambda first (a b) a)
946<CLOSURE>
947* ((lambda first (a b) a) 7 8)
9487
949* (atom 6)
950T
951\end{lstlisting}
952
953\chapter{Evaluation and Testing}
954To make development easier, ensure everything works correctly, and
955prevent regressions there are unit tests for the compiler and
956assembler, and functional tests for the entire project.
957
958Besides these automated tests, manual evaluation (in the beginning,
959precompiling simple expressions and supplying them to the processor as
960the initial value of the memory; later on, manually typing expressions
961in the REPL and looking at the output). When the processor was not
962operating as expected, the GTKWave wave simulator was used to assist
963with debugging. This piece of software takes in a recorded simulation
964of the circuits and displays the values of all signals inside the
965circuit against time. This allows the designer to look at every clock
966cycle in turn and discover where the error occurs.
967
968The process of debugging the processor had the side effect of finding
969inefficiencies in the original design. While \cite{lambda} mentions
970some limitations of the processors, one important limitation that was
971only discovered during debugging is that for every operation that the
972EVAL performs, the result of that operation will be written to new
973locations in memory. This includes ``read-only'' operations such as
974CAR and CDR. If the programmer requests the CAR of a list, a copy of
975the first element will be taken and a pointer to it will be returned.
976Since YULE only has 4096 words of memory, this severely restricts the
977complexity of programs that can be evaluated on it.
978
979Another avenue to ensure the processor works as it should is formal
980verification using model checking tools. Formal verification is
981commonly employed as part of circuit design evaluation because
982mistakes are more difficult to identify, and if found after the chip
983has been built they are very costly to fix (often the chips that were
984built with the bug need to be thrown away). However, building models
985is a very involved process, and devoting effort to formal verification
986of this project would not have allowed the implementation of several
987features. Due to time constraints and knowing that no physical YULE
988chips are built (the project is so far only synthesised to a FPGA),
989testing and manual evaluation was considered sufficient for YULE.
990
991\section{Unit testing}
992The software was tested with 48 unit tests written in Perl using the
993\texttt{Test::More} framework.
994
995The assembler takes in compiled Lisp code and outputs a memory dump in
996one of two formats (binary or Verilog code). Its tests take a
997configuration for the assembler, an input, two expected outputs, and
998(for documentation purposes) the source Lisp code. An example test:
999
1000\begin{lstlisting}
1001$expbin =
1002 '00 0C 00 00 00 00 20 00 20 00 00 0C C0 07'
1003 . ' 00 00 00 09 20 05 E0 00 80 0B 5F FE';
1004run_test {addr_bits => 13},
1005 '(call (more (funcall 0) (proc (var -2))) (number 5))',
1006 <<'EOF', $expbin, '((LAMBDA ID (X) X) 5)';
1007mem[ 0] <= 0; // (cdr part of NIL)
1008mem[ 1] <= 0; // (car part of NIL)
1009mem[ 2] <= 16'b0010000000000000; // (cdr part of T)
1010mem[ 3] <= 16'b0010000000000000; // (car part of T)
1011mem[ 4] <= 16'd12; // (free storage pointer)
1012mem[ 5] <= 16'b1100000000000111; // CALL 7
1013mem[ 6] <= 0; // (result of computation)
1014mem[ 7] <= 16'b0000000000001001; // MORE 9
1015mem[ 8] <= 16'b0010000000000101; // NUMBER 5
1016mem[ 9] <= 16'b1110000000000000; // FUNCALL 0
1017mem[10] <= 16'b1000000000001011; // PROC 11
1018mem[11] <= 16'b0101111111111110; // VAR -2
1019EOF
1020\end{lstlisting}
1021
1022Here the compiled code \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))}
1023is given to the assembler, and its outputs are compared to the binary
1024string on the first line and the block of Verilog code.
1025
1026The tests do not all contain valid assembler inputs. There are a
1027series of tests that give invalid inputs to the assembler and expect
1028the assembler to raise an exception matching a pattern. Example tests:
1029
1030\begin{lstlisting}
1031expect_error_like { run_test {}, '((type is a list) 5)'}
1032 'Type of toplevel is not atom';
1033expect_error_like { run_test {}, '(badtype 5)'}
1034 'No such type';
1035expect_error_like { run_test {}, '(5 700000)'}
1036 'Addr too large';
1037\end{lstlisting}
1038
1039On the compiler side, there are 4 tests for the quoting function,
1040but the brunt of the tests are pairs of a compiler input (i.e. Lisp
1041code) and a compiler output (i.e. a sexp that the assembler
1042understands). For each pair the compiler is run on the first element
1043and its output is compared with the second. Example such tests:
1044
1045\begin{lstlisting}
1046is_toplevel '(quote 5)', '(SYMBOL 3)';
1047is_toplevel '(reverse-list \'a \'a \'b)',
1048 '(CALL (MORE (MORE (REVERSE-LIST 0) (SYMBOL 4)) (SYMBOL 3)) (SYMBOL 3))';
1049is_toplevel '(if t \'(2 3) \'x)',
1050 '(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0)'
1051 . ' (SYMBOL 3)) (SYMBOL 4))) (SYMBOL 2))';
1052\end{lstlisting}
1053
1054As the compiler code includes a function to pretty-print
1055S-expressions, there are a few round-trip tests on this function.
1056Given a string, it is parsed using the \texttt{Data::SExpression}
1057parsing library then it is pretty-printed. The result should match the
1058original string.
1059
1060There are also tests for handling of invalid input, like in the assembler. These include:
1061
1062\begin{lstlisting}
1063expect_error_like { is_toplevel 'x' } 'Variable x not in environment';
1064expect_error_like { is_toplevel '(car)' } 'Cannot call primitive car with no arguments';
1065\end{lstlisting}
1066
1067A test coverage tool, \texttt{Devel::Cover}, was ran on the compiler
1068and assembler to find statements and branches that were not covered by
1069existing unit tests, and new tests were written to cover them. Right
1070now \texttt{Devel::Cover} reports 100\% test coverage of the compiler
1071and assembler.
1072
1073\begin{figure}
1074 \centering\includegraphics[height=8cm]{devel-cover.png}
1075 \caption{Screenshot of \texttt{Devel::Cover} results, showing 100\%
1076 test coverage}
1077\end{figure}
1078
1079\section{Functional testing}
1080Besides unit tests, the entire system (hardware and software) has
1081undergone functional testing. This means that a set of 21 Lisp
1082expressions has been prepared, alongside their correct values computed
1083using a software Lisp implementation. These values are given to the
1084YULE REPL and the test succeeds if the output equals the known correct
1085value of the expression.
1086
1087Here are some example tests:
1088\begin{lstlisting}
1089'(1 2 3)
10905
1091((lambda id (x) x) 5)
1092((lambda id (x) x) '(1 2 3))
1093(car '(2 3))
1094(car (cdr '(4 5 6)))
1095((lambda snd (x y) y) 'first 'second)
1096
1097(cons '(1 2) '(7))
1098(car (cons '(1 2) '(7)))
1099(cdr (cons '(1 2) '(7)))
1100
1101((lambda rev (xs) ((lambda reverse-acc (xs acc) (if xs (reverse-acc (cdr xs) (cons (car xs) acc)) acc)) xs '())) '(4 5 6 7))
1102\end{lstlisting}
1103
1104This test suite tries to cover most kinds of valid expressions. It
1105includes constants of all kinds, conditionals, lambda abstractions,
1106calls to all primitives, calls to user-defined functions, recursive
1107calls.
1108
1109\section{Timing and performance}
1110There are three parts to the performance of YULE:
1111
1112\begin{enumerate}
1113\item The efficiency of the software. This refers to the time taken to
1114 compile and assemble input programs, which is negligible.
1115
1116\item The I/O performance, which refers to the time needed to transfer
1117 the program from the computer to the processor's memory, plus the
1118 time needed to transfer the memory dump from the memory back to the
1119 computer.
1120
1121\item The speed of the processor itself, that is the time/number of
1122 cycles that the processor needs to evaluate a given expression.
1123\end{enumerate}
1124
1125The GC runs at 48MHz, while the EVAL only runs when the GC allows it
1126to which is at most half of the time.
1127
1128The UART runs at 4000000 baud, so a bit is sent every 12 cycles of the
112948MHz clock. A byte is made of 11 bits: a start bit, 8 data bits, and
1130one stop bit. A word is two bytes, so 22 bits. Therefore 264 cycles
1131are needed to transmit one word.
1132
1133Since the WRITER just sends a full memory dump, the I/O performance is
1134proportional not to \texttt{program\_len + result\_len} but to
1135\texttt{program\_len + result\_len + intermediate\_results\_len}. As
1136evaluation of any expression involves the creation of at least one
1137value (the result), for every expression or subexpression evaluated at
1138least 264 cycles will be needed to output that result. The EVAL takes
1139significantly less than 264 cycles to evaluate a expression (not
1140counting its subexpressions), in one test each expression took between
114130 and 90 cycles to evaluate. The processor performance is therefore
1142dominated by I/O.
1143
1144As an example, the program \texttt{((lambda id (x) x) 5)} is 12 bytes
1145long, and takes 258 cycles to evaluate. The memory dump is 54 bytes
1146long. Therefore of the roughly 17700 cycles needed from beginning to
1147end, about 98.5\% are used to do I/O.
1148
1149\section{A worked example}
1150\label{sec:example}
1151Here is an example of what happens when a expression is typed into the
1152REPL:
1153
1154\begin{enumerate}
1155\item A user types into the REPL the string
1156
1157\begin{lstlisting}
1158(car '(2 3 4))
1159\end{lstlisting}
1160
1161 This is a call to the CAR primitive, with the list \texttt{(2 3 4)}
1162 as an argument. We expect the result to be \texttt{2}, the first
1163 element of the list.
1164
1165\item This string is given to the compiler, producing the sexp
1166
1167\begin{lstlisting}
1168(CALL (CAR 0) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
1169\end{lstlisting}
1170
1171 Observe that in this representation the CDR comes before the CAR. So
1172 for example ``(LIST (LIST 0) (SYMBOL 3))'' represents a list whose
1173 CAR is ``(SYMBOL 3)'' and CDR is ``(LIST 0)'' (that is, nil). This
1174 is a list with one element, ``(SYMBOL 3)''. The sexp shown here
1175 represents a call to the CAR primitive with one argument. The list
1176 has three elements, which are in order \texttt{(SYMBOL 5), (SYMBOL
1177 4), (SYMBOL 3)}.
1178
1179 In this stage the compiler maps each number to a symbol, in this
1180 case \texttt{(SYMBOL 5)} is 2, \texttt{(SYMBOL 4)} is 3, and
1181 \texttt{(SYMBOL 3)} is 2. This mapping is remembered so it can be
1182 used later by the pretty printer.
1183
1184\item This sexp is given to the assembler, producing the 16 words long
1185 binary string (here displayed in hex, with a space after each 16-bit
1186 word)
1187
1188\begin{lstlisting}
1189000F 0000 0000 2000 2000 000F C007 0000 2000 0009 000B 2005 000D 2004 0000 2003
1190\end{lstlisting}
1191
1192 The first word, \texttt{000F}, indicates the length of the program
1193 in
1194 words (15 words, this does not include this first word).\\
1195 The next four words, \texttt{0000 0000 2000 2000}, represent the CDR
1196 and CAR of NIL and T respectively. These words are always
1197 constant.\\
1198 The next word, \texttt{000F}, represents the first unused position
1199 in memory. This is always equal to the first word transmitted.\\
1200 The next word, \texttt{C007}, represents the expression to be
1201 evaluated.\\
1202 The next word, \texttt{0000}, is ignored as this is where the
1203 processor will place the result of the evaluation.\\
1204 The rest of the words are subexpressions or parts of subexpressions
1205 of the expression being evaluated.\\
1206
1207 The expression that is evaluated is \texttt{C007}, which is a CALL
1208 whose CDR is at position 7 (\texttt{2000}, meaning \texttt{(CAR 0)})
1209 and whose CAR is at
1210 position 8 (\texttt{0009}).\\
1211 \texttt{0009} is a list with CAR at 10 (\texttt{2005}, meaning
1212 \texttt{(SYMBOL 5)}) and CDR at 9 (\texttt{000B}).\\
1213 \texttt{000B} is a list with CAR at 12 (\texttt{2004}, meaning
1214 \texttt{(SYMBOL 4)}) and CDR at 11 (\texttt{000D}).\\
1215 \texttt{000D} is a list with CAR at 14 (\texttt{2003}, meaning
1216 \texttt{(SYMBOL 3)}) and CDR at 13 (\texttt{0000}, meaning NIL).
1217
1218\item This binary string is sent via UART to the FPGA's READER. The
1219 READER writes the string (except the first word) into memory,
1220 overwriting the previous contents of the memory. It then reports to
1221 the controller that it is done, and the controller advances to
1222 \texttt{STATE\_RUN}.
1223
1224\item The EVAL and GC evaluate the program in memory, and then place
1225 the result back into memory. The EVAL reports to the controller that
1226 it is done, and the controller advances to \texttt{STATE\_WRITE}.
1227
1228\item The WRITER sends this memory dump via UART back to the REPL
1229
1230\begin{lstlisting}
1231000F C007 2005 2000 0009 000B 2005 000D 2004 0000 2003 0000 4004 2000 0010 0000 0012 0000 0000 0009
1232\end{lstlisting}
1233
1234 Then the WRITER reports to the controller that it is done, and the
1235 controller advances to \texttt{STATE\_READ}.
1236
1237 Observe that this memory dump is similar to the assembler output.
1238 The first two words are the 6th and 7th words in the assembler
1239 output, the next word is the result of the evaluation, then the next
1240 8 words are words 9th through 16th in the assembler output. The
1241 following words are intermediate results computed by the processor.
1242
1243\item The REPL's pretty printer interpets this string starting with
1244 \texttt{2005}, which means \texttt{(SYMBOL 5)}. The pretty printer
1245 looks in the compiler's mapping and finds that the 5th symbol was
1246 3, and so the REPL prints
1247
1248\begin{lstlisting}
12493
1250*
1251\end{lstlisting}
1252
1253 The evaluation is complete, and a new prompt is printed so that the
1254 user can input another expression.
1255\end{enumerate}
1256
1257\chapter{Conclusions}
1258A Lisp machine processor has been built using the original design and
1259techniques presented in \cite{lambda}. It has been extended with input
1260and output capabilities, and using a modern FPGA it has been
1261synthesised on a physical device slightly larger than a USB flash
1262drive. This device can be used as a standalone Lisp evaluator, or the
1263processor inside it can be included in a more complicated circuit.
1264
1265Software has been written to help interact with the processor. This
1266includes a compiler, an assembler, and a REPL that together make it
1267possible for an end-user unfamiliar with hardware design or the
1268processor's internals to submit Lisp expressions to the processor and
1269get back their values. The software also allows programmers to convert
1270Lisp source code to the binary format understood by the processor, so
1271that they may later send the binaries to the processor for evaluation.
1272
1273The processor and software have been tested to ensure they work
1274correctly. This has uncovered bugs and inefficiencies in the original
1275paper, some of which could not be gleaned from simply reading the
1276paper.
1277
1278The source files for the hardware and software, and also the automated
1279tests have been released under an open-source license to make it easy
1280for future circuit designers and programmers to use and extend this
1281project. They can be found at \url{https://git.ieval.ro/?p=yule.git}.
1282
1283The original paper \cite{lambda} is one of the influential Lambda
1284Papers that defined the Scheme programming language and is an
1285important historical artifact for understading how Lisp machine
1286processors can be implemented and more generally how tagged
1287architectures work. But all details of the processor cannot be
1288understood from simply reading the paper, and most readers would not
1289follow by attempting to implement what is described there. This
1290projects brings a tangible implementation into the open, which shows
1291the limitations of the design to people interested in this design and
1292makes it possible to solve this limitations and extend the processor
1293beyond its original scope.
1294
1295\section{Reflections}
1296Now that the project is finished, it is worth revisiting ideas I had
1297in the beginning that were not ultimately useful.
1298
1299The original concept had the processor run on a typical development
1300board, with the software running on a Raspberry PI. The board and the
1301Raspberry PI would be connected through jumper wires, and the UART
1302circuitry on the Raspberry PI would be used by the software to talk to
1303the processor.
1304
1305This idea would have produced a much harder to use YULE. Not only
1306would the user have to ssh into a Raspberry PI to do anything useful,
1307but also the equipment would take more space (both boards are larger
1308than the iCEstick). Once I tried the iCEstick and found that
1309everything can be done just as well with it, it replaced the two
1310boards and made the project easier to use (albeit slightly harder to
1311debug, as the iCEstick only has 5 LEDs).
1312
1313Some code had to be rewritten, sometimes multiple times. The compiler
1314was originally written in Common Lisp, as it is the best suited
1315language for manipulation of S-expressions, especially Lisp code. This
1316meant a working project was obtained more quickly, but then the
1317compiler had to be rewritten in another language; if the user has a
1318software Lisp implementation already, why would they need YULE? I
1319chose Perl as the reimplementation language, as the assembler was
1320already written in Perl and it is widely-available, most systems
1321coming with a Perl interpreter out of the box.
1322
1323Throughout development of the hardware part, timing problems were a
1324mainstay. The first ones related to the interactions between the
1325processor and the memory (the memory was clocked, but the processor
1326expected a combinational RAM). What fixed this problem was to run the
1327memory at twice the speed of the rest of the processor. At some point,
1328a decision was taken to run the GC and EVAL on different clocks (both
1329ran at one quarter the speed of the master clock, but one of the
1330clocks was delayed by half a cycle. The positive edge on one clock
1331would happen midway between a positive edge and a negative edge on the
1332other one). This proved to be unwise, as sometimes values sent by one
1333module to the other would be cleared before the destination module
1334managed to read them.
1335
1336GTKWave proved to be very useful in debugging these timing issues, as
1337the exact point where the issue began was easy to identify and I could
1338see why the issue was happening. A workaround for this was to add a
1339latch on each side of the common E bus that remembered the value sent
1340long enough for the receiving module to see it.
1341
1342All timing problems were fixed more cleanly in the end when an attempt
1343to simplify the clocks was made. First GC and EVAL were changed to use
1344the same clock, which removed the need for the latches, and then the
1345memory was changed to use the negative edge of the clock instead of
1346the positive edge, which meant no more clock division code had to be
1347used.
1348
1349\section{Future work}
1350There are several ways to extend this project.
1351
1352One idea is to fix the noted limitations: the ATOM primitive considers
1353NIL to not be an atom, and there is no EQ primitive that compares
1354symbols/other objects. The latter problem means that all symbols are
1355indistinguishable (note: NIL can still be distinguished from T and
1356other symbols, but NIL is not a symbol). To introduce equality a new
1357signal needs to be added to the EVAL that compares (for example) the
1358value on the E bus with the value of a fixed register, possibly a
1359newly added one. Then (EQ A B) would store A in the register, load B
1360on the E bus and compare them.
1361
1362Once equality is implemented, it becomes possible to implement the
1363EVAL function that evaluates an arbitrary Lisp expression.
1364
1365Two more involved instruction set improvements are adding arithmetic
1366operations, and input/output. Without arithmetic primitives one has to
1367rely on Church numerals, which encodes numbers as lambda abstractions
1368and is very inefficient. Input/output primitives would allow programs
1369to receive data at runtime over the UART and send data back before
1370evaluation is finished. This enables long-running useful programs, for
1371example with input/output and equality a REPL can be written (which
1372relies on the EVAl function mentioned above).
1373
1374On the software side, the assembler can support optimizations and then
1375the REPL can be extended with a core image. The main optimization
1376needed in the assembler is common subexpression elimination. The
1377processor never overwrites a value in memory with another value (other
1378than overwriting \texttt{mem[6]} with the result of the evaluation),
1379so it is safe to share any identical subexpressions. For example, if
1380the user inputs the program \texttt{(cons (car '(1 2 3 4)) (cdr '(0 2
1381 3 4)))}, then when the assembler should share the tails of the two
1382lists, meaning the second list should compile to \texttt{(LIST (LIST
1383 X) (SYMBOL Y))} where X is a pointer to where the second element of
1384the first list is stored. This makes long programs significantly
1385shorter. Note that no compiler changes are required to implement this,
1386as the assembler can just remember all expressions and subexpressions
1387it has assembler so far including their locations, and whenever a new
1388expression needs to be assembler it can be replaced with a pointer to
1389where the same expression has been placed before.
1390
1391Once this optimization is implemented, the REPL can gain core image
1392support. Suppose the user enters the expression \texttt{expr}. Instead
1393of sending \texttt{assemble(compile(expr))} to the processor, the REPL
1394can remember the last expression sent (or the last memory dump
1395received), then append \texttt{assemble(compile(expr))} to that
1396previous expression. This way \texttt{expr} can refer to previously
1397defined expressions, as long as the REPL or compiler has a way to name
1398previously supplied instructions, for example via a \texttt{define}
1399special form, via the standard REPL variables (\texttt{+, ++, +++}),
1400or by sequentially numbering all expressions submitted by the user.
1401
1402Returning to hardware, a potential future project is to add actual
1403hardware garbage collection to the GC component. At the moment the
1404design does no garbage collection, the processor only has 4096 words
1405of RAM available, and evaluation is quite memory-inefficient. Adding
1406garbage collection would make more complicated programs possible
1407despite the limited RAM. One way to implement this is to add a small
1408microprocessor to the design which is preprogrammed with a garbage
1409collection algorithm (simple choices are a stop-and-copy collector or
1410Cheney's algorithm). When the free space pointer gets too high, the
1411EVAL and GC are stopped, and the new microprocessor is started up so
1412it can perform garbage collection, compact the memory, and update the
1413free space pointer. The EVAL/GC registers would have to be updated as
1414well. Then the microprocessor can be stopped, EVAL and GC be started
1415up again and evaluation can continue. Note that a non-moving collector
1416is not sufficient: the GC allocates new words only at the very end of
1417the memory, so it cannot benefit from free ``gaps'' in the middle of
1418the memory.
1419
1420\begin{thebibliography}{99}
1421\bibitem{lambda} Design of LISP-Based Processors (1979), available at
1422 \url{http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-514.pdf}
1423\bibitem{osdvu} \url{https://opencores.org/project/osdvu}
1424\bibitem{gtkwave} \url{http://gtkwave.sourceforge.net/}
1425\bibitem{icestorm} \url{http://www.clifford.at/icestorm/}
1426\end{thebibliography}
1427\end{document}
This page took 0.078441 seconds and 4 git commands to generate.