Add diagrams and pictures
[clump.git] / report.tex
CommitLineData
e7b86bf0
MG
1\documentclass[a4paper,12pt]{report}
2\usepackage{color}
3\usepackage[cm]{fullpage}
4\usepackage{graphicx}
5\usepackage{hyperref}
6\usepackage{listings}
7\usepackage{pdfpages}
8
9\title{YULE: Yet Unnamed Lisp Evaluator}
10\author{Marius Gavrilescu}
11\date{May 2018}
12
13\begin{document}
14% title page does not need anchors
15\hypersetup{pageanchor=false}
16\maketitle
17\hypersetup{pageanchor=true}
18\lstset{language=Perl, frame=leftline, framexleftmargin=5pt,
19 showstringspaces=false, showspaces=false, breaklines=true,
20 postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space}}
21
22\chapter*{Abstract}
23Lisp machines are computers designed to run Lisp programs efficiently.
24They were first developed in the 1970s, and they have special hardware
25support for Lisp. This usually comprises a tagged architecture (each
26pointer has a ``type'' field), garbage collection implemented in
27hardware, and an instruction set that lends itself well to evaluating
28Lisp expressions.
29
30Between 1975 and 1980, Guy L. Steele and Gerald Jay Sussman wrote a
31series of memos on lambda calculus, the Lisp language and its
32implementation known collectively as the ``Lambda Papers''. One of
33these papers, ``Design of LISP-Based Processors (1979)'' describes how
34to implement a Lisp machine processor. If this processor is connected
35to a RAM that contains a program, it executes that program and places
36the result back into the RAM.
37
38The aim of YULE is to take the design from the 1979, implement it
39today using modern technology and techniques, and bundle it up in a
40form that is both easy to use and extend. By going through this
41modernisation process, future developers can easily integrate the Lisp
42processor into their own design, modifying it as they please.
43
44The end result of YULE is a small device that can be plugged into a
45USB port, and an associated computer program. Running the program
46presents a simplified Lisp listener. The user can type a Lisp
47expression into the listener, and the listener will respond with the
48value of the expression. Under the hood, the listener evaluates Lisp
49by sending the expression over USB to the small device and getting
50back the result.
51
52YULE comes with the design files for the hardware, and source code for
53the software. All of these are released under a free software license
54and published online, which makes it very easy (from both a practical
55and a legal standpoint) for future developers to use in their own
56projects and modify to fit their purposes better.
57\tableofcontents
58
59\chapter{Introduction}
60\section{Motivation}
61Lisp machine processors are quite unlike modern processors. Modern
62instruction sets are designed for imperative programming. Each
63instruction makes the processor execute some operation, such as
64loading a register or storing a value in memory. In contrast, a Lisp
65machine processor reads an expression from memory, determines its
66type, and recursively evaluates it. This operation lends itself to
2dbe6ed8 67declarative programming, where the programmer declares an arbitrarily
e7b86bf0
MG
68complicated expression and asks the computer to evaluate it.
69
70Another common feature of Lisp machine processors is the use of tagged
71pointers. Each memory location is a tagged union of two elements: the
72value itself, and a tag representing the type of that data. Tagged
73architectures are not common today, although some programs still
74use tagged pointers by exploiting pointer alignment (if every pointer
75is a multiple of 8, then the bottom 3 bits can be used to store a
76tag). These tags can be used to store an object's reference count (as
77in Objective-C on certain versions of iOS), or information about a
78data structure (as in certain implementations of kd-trees).
79
80The ``Design of LISP-Based Processors (1979)'' paper by Guy L. Steele
81and Gerald Jay Sussman \cite{lambda} presents the design of a
82relatively simple Lisp machine processor, which is made from two
83read-only memories, two buses, and several registers. If this
84processor is connected to a RAM containing a Lisp expression in a
85certain binary format, the processor will evaluate the expression and
86place the result back in memory.
87
88This processor features both a tagged architecture (the top 3 bits of
89each word in memory represent the value's type) and the declarative
90nature mentioned above. It is therefore interesting to study and to
91bring this processor into today's world by using technology such as
92hardware description languages (that allow us to describe a circuit
93formally) and FPGAs (that allow us to synthesise any circuit quickly
94and at low cost). By building a HDL implementation of the processor
95not only can it be used easily on its own (by synthesising it to a
96FPGA and connecting it to RAM), but a future circuit designer can take
97this processor and include it in their more complicated design.
98
99The processor presented in the original paper is a somewhat barebones
100implementation of Lisp. Arithmetic primitives are not available, nor
101are there any input or output routines. Due to time constraints the
102paper's writers have not included a equality primitive. Additionally
103the ATOM primitive has a bug that makes it consider NIL not atomic.
104All these deficiencies can be rectified given time and effort, but it
105is difficult for one to fix them given only a paper describing the
106processor. Having a HDL implementation of the processor with a proper
107test suite allows a future circuit designer to more easily make
108changes to the processor while using the test suite to prevent
109regressions and using a FPGA to quickly obtain a hardware version of
110the changed processor that can be manually tested and used in any
111desired application.
112
113\section{Challenges}
114While \cite{lambda} is part of the well-known Lambda Papers series and
115has been read by many over the years, there are no published
116implementations of the processor described in the paper. Therefore
117\cite{lambda} was the only source used in the building of this
118project, and several of the limitations of the processor were not easy
119to see before the implementation phase was well underway. Furthermore,
120the original paper has several shortcomings.
121
122For one, while the microcode (the values of the two read-only
123memories) is provided in both source and compiled form, the microcode
124compiler is not provided. This makes it difficult to alter the
125microcode to fix bugs or introduce new instructions. One needs to
126either work on the already-compiled code (hard and error-prone) or
127reimplement the microcode compiler, which takes a reasonable amount of
128effort.
129
130Moreover, the paper does not concern itself at all with how the user
131gets a program into memory or how the user gets the result back. This
132makes the processor as-described very difficult to use.
133
134To implement the processor in Verilog, no source other than the paper
135was used. There were no previous implementations of the processor that
136could be consulted to resolve unclarities in the original paper.
137
138In addition, there was no software available for interacting with the
139processor. A compiler and assembler had to be written from scratch,
140and so the processor could not be easily debugged until after the
141software was functional.
142
143\section{Contribution}
144YULE is a modern-day implementation of a Lisp machine processor using
145the ideas and techniques presented in \cite{lambda}.
146
147The main contribution of YULE is a Verilog implementation of the
148processor which allows future circuit designers to use it in their own
149design and modify it as they please.
150
151As the processor as described has no way to input a program or extract
152a result from memory, YULE includes more circuitry around the
153processor that lets it communicate with the external word. This
154consists of a serial UART and two modules, a reader and a writer. When
155the processor is idle, the reader waits for a program to be sent over
156the UART. Once a program has been received, it is placed into memory
157and the processor starts to evaluate it. Once evaluation is done, the
158writer transmits the new contents of the memory over the UART.
159
160Besides the hardware chip, YULE also includes three pieces of
161software:
162
163\begin{itemize}
164\item a compiler, which takes Lisp source code and produces an
165 intermediate representation of the program.
166\item an assembler, which takes a program in the intermediate
167 representation and produces the binary representation of the program
168 that the processor can understand.
169\item a REPL, in which users can type expressions. When an expression
170 is submitted, the REPL compiles it, assembles it, and sends it to
171 the processor. The processor evaluates the expression and sends back
172 a memory dump. The REPL then reads the memory dump, interprets it,
173 and prints the value of the expression in a human-readable format.
174\end{itemize}
175
176The REPL is therefore the entry point of YULE as a whole. An end-user
67d929ce
MG
177who does not intend to extend YULE can plug in a USB device containing
178the YULE processor, and run the REPL. Then the user can write
179expressions in the REPL and get their values back. Besides being the
180entry point, the REPL serves as a convenient way to test that the
181entire system is working. An expression typed in the REPL goes through
182the compiler and assembler, and then is sent to the processor. It goes
183through the three stages of the processor (reading the expression into
184memory, evaluating the contents of the memory, writing a memory dump).
e7b86bf0
MG
185
186The hardware and software components of YULE have been tested to
187ensure they function correctly. There is an automated test suite
188containing 48 unit tests for the compiler and assembler, and 21
189functional tests that engage all parts of the system (both the
190software and the hardware). The unit tests ensure that the various
191small parts of the compiler and assembler work properly, while the
192functional tests consist of pairs of short Lisp expressions and their
193correct values. The expressions are written to the REPL, and the
194REPL's output is compared to the correct values.
195
196To make it easier for future circuit designers and programmers to use
197this project, YULE has been released under an open source license.
198Prospective users can find all the source files (for both hardware and
199software) online, and are free to use, modify, and redistribute them
200under the same terms as Perl itself. The files can be accessed at this
201URL: \url{https://git.ieval.ro/?p=yule.git}.
202
203\section{Layout of this report}
204This report begins by describing the background of YULE, the
205technologies that made this project possible and a more in-depth
206description of the 1979 paper. Then the components that YULE comprises
207are examined in turn, followed by a chapter on how everything was
208tested, which includes a worked example of the system in action. The
209report ends with a list of ideas for future work based on YULE, and a
210section on reflections.
211
212\chapter{Background}
213
214\section{Field-programmable gate array}
215An FPGA is a configurable integrated circuit. A designer can define a
216circuit using a hardware description language (explained in the next
217section) and compile it to some binary format called a configuration.
218This configuration describes how the different components of the FPGA
219should be connected to each other. The configuration can then be
220uploaded to the FPGA, and the FPGA will connect its internal
221components as requested.
222
223An FPGA can therefore replace any digital custom circuit (ASIC). While
224an FPGA consumes more energy, uses more space, and runs at slower
225clock rates than an ASIC, the FPGA permits rapid prototyping: the time
226between finishing to write a design and having a working part is mere
227seconds (compilation time + uploading the configuration to the FPGA),
228and if a bug is discovered the buggy part need not be thrown away. The
229design can be modified and the fixed configuration can be uploaded to
230the FPGA.
231
232The hardware aspect of YULE is designed to run on the Lattice
233iCE40HX-1k FPGA, but with small modifications it can run on other
234FPGAs as well.
235
236\subsection*{iCEstick}
237\begin{figure}
238 \centering
239 \includegraphics[height=4cm]{icestick.png}
240 \caption{Lattice iCEstick Evaluation Kit}
241\end{figure}
242
243The Lattice iCEstick Evaluation Kit is a development board for the
244iCE40HX-1k FPGA. It contains the FPGA, a USB connector, a FTDI chip
245and 5 LEDs. The form factor is very convenient. It is similar to a USB
246memory stick (albeit bigger) that can be plugged into a computer,
247avoiding the need for jumper wires and GPIO that traditional
248development boards require.
249
250The FTDI chip implements the USB protocol, allowing the FPGA to be
251programmed by the computer and letting the FPGA talk to the computer
252using a very simple serial UART interface on the FPGA side, and a
253standard USB serial port on the computer side. This makes
254communication very easy. The FPGA can use any off-the-shelf UART
255implementation (this project uses \texttt{osdvu} from
256Opencores \cite{osdvu}), and the computer can use standard serial
257console tools such as \texttt{minicom}.
258
259\section{Verilog}
260Verilog is a hardware description language. It is used to model
261digital circuits at register-transfer level of abstraction. In Verilog
262one defines several modules, with one of them (the ``toplevel''
263module) representing the entire design. Each module has several inputs
264and outputs, can contain internal wires, can contain instances of
265other modules, and can contain combinational and sequential logic
266blocks that drive internal wires and outputs.
267
268Verilog has many more complicated features that are not used in YULE.
269These include tristate buffers, delays, functions, tasks, loops, and
270strength declarations.
271
272A subset of Verilog can be \textit{synthesised}; that is converted to
273a netlist (a list of transistors or FPGA components used and
274connections between them) that can then be used to build an ASIC
275(custom integrated circuit), or can be further converted to the
276internal representation of the FPGA and uploaded to the FPGA. The FPGA
277workflow and the programs that are part of it are described in the
278next section.
279
280A verilog project can also be run through a simulator, which simulates
281the design clock edge by clock edge and records the values on each
282wire in the design. This record can later be read in and visualised
283with a wave viewer, which allows a circuit designer to better
284understand why the circuit is not working correctly. One such program,
285GTKWave \cite{gtkwave}, was used to debug YULE.
286
287\section{icestorm}
288Project IceStorm \cite{icestorm} is a set of open source reverse
289engineered tools for programming Lattice iCE40 FPGAs. There exists a
290fully open source Verilog-to-Bitstream flow which comprises:
291
292\begin{enumerate}
293\item Yosys, a Verilog synthetizer.
294\item Arachne-pnr, a tool that implements placing and routing for
295 iCE40 FPGAs.
296\item icepack, which converts Arachne-pnr's output to iCE40 bitstream.
297\item iceprog, which programs the iCEstick with the output of icepack.
298\end{enumerate}
299
300\section{Design of LISP-Based Processors (1979)}
301Design of LISP-Based Processors (1979) \cite{lambda} is one of the
302``Lambda papers''; a series of memos by Sussman and Steele about the
303Lisp language and its implementation. The paper describes SIMPLE, a
304microprocessor designed to run Lisp.
305
306This processor is made of two read-only memories called EVAL and GC.
307Each read-only memory is paired with state registers to form a state
308machine. The contents of the state registers are fed into the
309read-only memory, and the output of the read-only memory determines
310the next state and other signals.
311
312Each state machine also has an internal bus (called the E and G buses
313respectively) and several registers connected to that bus. Each of
314these registers stores a tagged pointer, which is an 11-bit value (the
315top 3 bits represent the type, and the bottom 8 bytes represent the
316value).
317
318The signals coming out of each state machine can cause a register to
319be read onto the machine's internal bus, a register to be loaded from
320the internal bus, or write a constant value onto the internal bus.
321
322The GC state machine represents the processor's memory system. It is
323connected to an external RAM, which it controls through several
324signals. It knows how to execute certain ``GC operations'', which
325represent common Lisp memory management actions. Examples of these are
326\texttt{CAR} and \texttt{CDR} (``given a pointer to a cons cell,
327return either the CAR or the CDR of that cons cell'') or \texttt{CONS}
328(``given two pointers, allocate a new cons cell, copy them into it,
329and return a pointer to the new cons cell''). The existence of the GC
330isolates the rest of the processor from the particularities of the
331memory used.
332
333The EVAL state machine evaluates Lisp expressions. Signals coming out
334of it control the GC by selecting what operation the GC should perform
335and connecting the E and G buses to supply arguments to the GC and
336read results back. The EVAL reads expressions from memory via the GC
337and evaluates them, putting results back into memory.
338
339The overall operation of the processor is to read an expression from
340the magic location ``5'' in memory, evaluate it, and place the result
341into the magic location ``6'' in memory. Other memory locations are
342used to store subexpressions of the input expression, and to store any
343intermediate results. Despite its name, GC does not perform any
344garbage collection, so at the end of the processor's operation the
345memory will still contain all intermediate results produced during
346evaluation.
347
348The processor is intended as a proof of concept and not a
349production-ready hardware implementation of Lisp. Therefore it has
350several important limitations, for example there is no primitive
351operation that implements equality of objects (making it impossible to
352distinguish between symbols), there are no primitive arithmetic
353operations (so one needs to use Church numerals if numbers are
354needed), and the processor includes a known bug (``the ATOM bug'').
355The bug is that the ATOM primitive considers NIL to be non-atomic,
356which is incorrect.
357
358The Lisp expressions mentioned here are in textual form (Lisp source
359code), but instead in a special tagged pointer format. Each memory
360value is a union of two values: a 8-bit pointer and a 3-bit type. We
361can use pairs of consecutive memory locations to represent cons cells
362and build linked lists out of them. If a list is at location X, that
363means that (X, X+1) is the first cons cell of the list, with X+1 being
364the CAR (the value of the first element) and X being the CDR (a
365pointer to the next cons cell). A discussion of the different ways to
366represent Lisp expressions (source code, this tagged pointer format,
367and an intermediate representation) can be found in \ref{sec:formats}.
368
369The next page shows a diagram of how the processor in the paper is
370structured. The two read-only memories can be seen, with their
371registers, state registers, and the interconnection between the E and
372G buses.
373
374\begin{figure}
375\centering\includegraphics[width=18.5cm]{AIM-514-design.pdf}
376\caption{Design diagram of processor from \cite{lambda}}
377\end{figure}
378
379\chapter{A Lisp Machine processor on a USB drive}
380The project comprises a hardware part written in Verilog and a
381software part written in Perl.
382
383The hardware part is a Lisp machine processor with associated
384circuitry that can read a program from the iCEstick UART and that can
385write the result of running a program to the UART. The normal
386operation of this part is to:
387
388\begin{enumerate}
389\item Wait for a program to be supplied via UART
390\item Run the program on the processor
391\item Write a memory dump to the UART
392\item Go back to step 1
393\end{enumerate}
394
395The software part includes:
396
397\begin{itemize}
398\item A compiler, which takes Lisp source code and puts it in the
399 linked list format understood by the SIMPLE processor
400\item An assembler, which takes the result of the compiler and
401 converts it to the SIMPLE in-memory binary format.
402\item A REPL, which repeatedly reads a program from the user, compiles
403 it, assembles it, sends it to YULE, reads the memory dump from YULE,
404 and prints the result for the user to see.
405\end{itemize}
406
e7b86bf0
MG
407\section{Hardware}
408The hardware part is built from 8 modules:
409
410\begin{enumerate}
411\item The \textbf{READER}, which implements step 1 of the normal operation described above.
412\item The \textbf{WRITER}, which implements step 3 of the normal operation described above.
413\item The \textbf{GCRAM}, a single-port RAM.
414\item The \textbf{GC}, corresponding to the GC PLA in the original implementation.
415\item The \textbf{EVAL}, corresponding to the EVAL PLA in the original implementation.
416\item The \textbf{CTRL}, which advances the operation from one step to the next one.
417\item The \textbf{UART}, which implements transmission and reception of bytes via UART.
418\item The \textbf{CPU}, which instantiates every module above and wires them together.
419\end{enumerate}
420
421\subsection{Clocks}
422To avoid timing problems, a single 48 MHz clock is used. The iCEstick
423has a 12 MHz oscillator, and the phase-locked loop circuitry on the
424FPGA multiplies this signal by 4.
425
426Every flip-flop updates on the positive edge of the clock, except for
427the RAM which updates on the negative edge. This way the rest of the
428circuit perceives the RAM to be combinational (it ``updates
429immediately''): if on a positive edge the RAM's address input changes,
430the RAM will update its output on the following negative edge and the
431changed output will be visible on the next positive edge. If the RAM
432updated on a positive edge, then if the input changed on positive edge
433$i$ the changed output will only be visible on positive edge $i+2$.
434
435The design is intended to be run on a FPGA, which has a fixed clock
436tree. This typically consists of a small number of special nets called
437global buffers that can distribute signals with minimal skew to every
438logic cell. The FPGA on the iCEstick has 8 such global buffers.
439
440
441There are 4 components in the design (GC, EVAL, READER, WRITER) that
442need to be enabled/disabled at different times. One way to achieve
443this is to AND the master 48 MHz clock with several signals, thus
444creating a separate clock for each component. This is known as
445\textit{clock gating}. However, on a FPGA clock gating involves
446routing the clock from a global buffer to a logic cell and then back
447to another global buffer. This can introduce glitches in the clock and
448would delay each of the clocks by some amount (dependent on the
449signals it is ANDed with). It is therefore possible that the 4 output
450clocks would not be aligned, causing problems when data is
451communicated by one component to another. As such, clock gating is not
452recommended on FPGAs.
453
454\begin{figure}
455 \centering\includegraphics[height=3cm]{gating.eps}
456 \caption{Gated clock on the left, clock enable on the right}
457\end{figure}
458
459To avoid this problem, the design used here does not use clock gating
460and instead uses 4 clock enable signals supplied by the controller to
461each of the GC, EVAL, READER and WRITER. At any point the clock is
462enabled for exactly one of the READER, GC, and WRITER. The EVAL clock
463is enabled if and only if the GC clock is enabled and the GC's
464\texttt{step\_eval} output is high.
465
e7b86bf0
MG
466\subsection{GC}
467\begin{figure}
468 \centering\includegraphics[height=4.88cm]{gc.eps}
469 \caption{GC module}
470\end{figure}
471
472The GC module is connected to the GCRAM (via the \texttt{ram\_do,
473 ram\_we, ram\_addr, ram\_di} signals), to the EVAL (via the
474\texttt{Ein, Eout, gcop, step\_eval, conn\_et, conn\_ea} signals) and
475to the controller. It also provides the value of its P register (the
476free storage pointer) to the WRITER, so the writer can know when to
477stop dumping the memory.
478
479It is implemented as a combinational ROM, some combinational logic to
480drive the internal G bus, sequential logic to advance to the next
481state respecting dispatch rules, and sequential logic to update
482registers. The clock enable signal acts as an active low synchronous
483reset: when the clock is disabled, the state register is set to the
484initial state.
485
486This module represents one half of the original design presented in
487\cite{lambda}, and the ROM's contents are those from the paper with a
2dbe6ed8 488bug (the original design sets the ADR line low in two states when it
e7b86bf0
MG
489should be high).
490
491\subsection{EVAL}
492\begin{figure}
493 \centering\includegraphics[height=3.29cm]{eval.eps}
494 \caption{EVAL module}
495\end{figure}
496
497The EVAL module is connected to the GC (via the \texttt{conn\_et,
498 conn\_ea, Ein, Eout, gcop} signals) and to the controller.
499
500The implementation of the EVAL is very similar to that of the GC.
501There is a combinational ROM, combinational logic to drive the E bus,
502sequential logic to advance to the next state respecting dispatch
503rules, and sequential logic to update registers. A synchronous reset
504is available. When the reset line is held high for one clock cycle,
505the state register is set to the initial state. It is necessary to
506separate the reset from the clock enable because the GC often pauses
507the EVAL to do work and the EVAL should not be reset every time this
508happens.
509
510This module represents the other half of the original design presented
511in \cite{lambda}, and the ROM's contents are the same as those in the
512paper except that the ROM is widened by one bit to indicate whether
513the EVAL is finished evaluating the expression.
514
e7b86bf0
MG
515\subsection{Writer}
516\begin{figure}
517\centering\includegraphics[height=2.26cm]{writer.eps}
518\caption{WRITER module}
519\end{figure}
520
521The WRITER module is connected to the UART (via the \texttt{tx\_busy,
522 tx\_signal, tx\_byte} signals), the GC (via the \texttt{P} signal),
523the RAM (via the \texttt{ram\_addr, ram\_do} signals), and to the
524controller.
525
526Its purpose is to dump the memory to the UART starting with position 4
527(since positions 0, 1, 2, 3 are constant) and finishing with position
528$P - 1$, the last used cell in memory. It is implemented as a state
529machine with 7 states and an extra register (\texttt{current\_index})
530that stores the current index in the memory.
531
532The memory is 16-bit wide, but the UART sends and receives bytes. The
533writer dumps memory word in network byte order (big-endian), meaning
534the significant byte is sent before the least significant byte.
535
536The states are, in order:
537\begin{description}
538\item[START] which initializes \texttt{current\_index} to 4
539\item[WRITE1\_WAIT] which waits for \texttt{tx\_busy} to become low
540\item[WRITE1] in which transmission of the upper byte starts
541\item[WRITE2\_WAIT] which waits for \texttt{tx\_busy} to become low
542\item[WRITE2] in which transmission of the lower byte starts
543\item[INCREMENT] in which \texttt{current\_index} is incremented,
544 jumping back to WRITE1\_WAIT if it is still lower than
545 \texttt{freeptr}.
546\item[FINISHED] in which \texttt{finished} is driven high
547\end{description}
548
549This module is not part of the original design in \cite{lambda}, as
550that paper did not concern itself with input and output.
551
e7b86bf0
MG
552\subsection{Reader}
553\begin{figure}
554\centering\includegraphics[height=2.22cm]{reader.eps}
555\caption{READER module}
556\end{figure}
557
558The READER module is connected to the UART (via the
559\texttt{rx\_signal, rx\_byte} signals), the RAM (via the
560\texttt{ram\_we, ram\_addr, ram\_di} signals), and to the controller.
561
562Its purpose is to read a program from the UART and overwrite the GCRAM
563with this program. It is implemented as a state machine with two extra
564registers (\texttt{total\_words} and \texttt{current\_index}) to store
565the size of the incoming program and the current position within it.
566
567The reader expects to receive a 13-bit \texttt{length} value, followed
568by \texttt{length} 16-bit values that represent the desired memory
569contents. Each of these values is sent in network byte order
570(big-endian), meaning the most significant byte is sent before the
571least significant byte.
572
573The states are, in order:
574\begin{description}
575\item[IDLE] which waits for \texttt{rx\_signal}, puts the read byte at the top of \texttt{total\_words}.
576\item[LENGTH] which waits for \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{total\_words}
577\item[READ1] which waits for the \texttt{rx\_signal}, puts the read byte at the top of \texttt{ram\_di} and increments \texttt{current\_index}
578\item[READ2] which waits for the \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{ram\_di}
579\item[WRITE] in which \texttt{ram\_we} is high, jumping back to READ1 if \texttt{current\_index + 1 < total\_words}
580\item[FINISHED] in which \texttt{finished} is high
581\end{description}
582
583This module is not part of the original design in \cite{lambda}, as
584that paper did not concern itself with input and output.
585
e7b86bf0
MG
586\subsection{Putting the hardware together}
587The components described above are managed by the controller, which
588provides them with clock enable signals. The controller is a state
589machine with three states:
590\begin{description}
591\item[STATE\_READ] in which only the READER clock is enabled
592\item[STATE\_RUN] in which the GC and EVAL clocks are enabled
593\item[STATE\_WRITE] in which only the WRITER clock is enabled
594\end{description}
595
596In STATE\_READ and STATE\_WRITE, the EVAL's reset line is high.
597
2dbe6ed8 598Whenever the component corresponding to the current state indicates it
e7b86bf0
MG
599is finished, the controller advances to the next state.
600
601The controller also includes three multiplexers that control which
602lines are connected to the GCRAM. If the READER is active, all three
603GCRAM lines are connected to the READER. If the WRITER is active, the
604address line is connected to the writer (and the other lines are
605connected to the GC). If the GC is active, all lines are connected to
606the GC.
607
608The controller also includes logic to drive the 5 LEDs on the
609iCEstick. Three LEDs indicate the current state, while the other two
610indicate if the UART is currently transmitting or receiving.
611
612\section{Representing a Lisp program}
613\label{sec:formats}
614The processor described above evaluates a Lisp program received via
615UART. This section talks about three representations of such a
616program: the Lisp source code (a S-expression), the intermediate form
617(also a S-expression), and the in-memory form (a binary string).
618
619The first representation is obtained by parsing the user input, the
620compiler converts this into the second, and the assembler converts the
621second into the third.
622
623\subsection{Lisp source code}
624Recall that a S-expression is either an atom or a cons cell.
625
626An atom is a sequence of letters, digits, and certain symbols. There
627are two special atoms named \texttt{T} and \texttt{NIL} representing
628the boolean true and false respectively.
629
630A cons cell is a pair of two S-expressions called the ``car'' and the
631``cdr''. We write them as \texttt{(car . cdr)}, for example \texttt{(5
632 . 7)} is a cons cell with car \texttt{5} and cdr \texttt{7}.
633
634We can use cons cells to represent linked lists. A linked list is a
635sequence of cons cells where the cdr of one cell is the next cell, and
636the cdr of the last cell is \texttt{NIL}. For example the list
637\texttt{1,2,3} can be written as \texttt{(1 . (2 . (3 . NIL)))}.
638We can also write this list more concisely as \texttt{(1 2 3)}, and we
639write \texttt{()} to mean the atom \texttt{NIL}.
640
641Using these syntax elements we can encode several types of logical
642expressions:
643
644\begin{itemize}
645\item Numbers are encoded as numeric atoms, e.g. \texttt{5}
646\item Variables are encoded as non-numeric atoms, e.g. \texttt{my-var}
647\item Conditional statements are encoded as 4-element lists, e.g.
648 \texttt{(if is-odd 5 6)} which has value 5 if the variable
649 \texttt{is-odd} is not \texttt{NIL} and value 6 otherwise
650\item Symbols are encoded as non-numeric atoms preceded by a single
651 quote, e.g. \texttt{'my-symbol}. This can also be written
652 \texttt{(quote my-symbol)}
653\item Constant lists are encoded as lists preceded by a single quote,
654 e.g. \texttt{'(1 2 (3 4 5))}. This can also be written
655 \texttt{(quote (1 2 (3 4 5)))}
656\item Functions (lambda abstractions) are encoded as 4-element lists,
657 e.g. \texttt{(lambda first (a b) a)} in which the second element is
658 the function name, the third element is a list of function
659 arguments, and the fourth element is the function body.
660\item Function calls are encoded as lists, with the function as the
661 first element and the arguments as the other elements, e.g.
662 \texttt{(atom 5)} calls the function \texttt{atom} with argument
663 \texttt{5}.
664\end{itemize}
665
666\subsection{In-memory representation}
667The memory is an array of 16-bit integer values. We interpret these as
668tagged pointers: the top 3 bits indicate the ``type'' of this pointer,
669while the other 13 bits represent another memory location. A pointer
670therefore has a type and a value.
671
672We represent each Lisp expression as a tagged pointer. To evaluate it,
673the processor looks at the type of the pointer. Here is what a pointer
674of value $v$ means based on its type. We use the syntax $mem[a]$ to
675mean ``the contents of the memory location $a$'' (that is, a tagged
676pointer) and $memval[a]$ to mean ``the value at the memory location
677$a$'' (that is, the value of the tagged pointer $mem[a]$).
678
679\begin{enumerate}
680\setcounter{enumi}{-1}
681\item is a constant list, whose cdr is $mem[v]$ and car is $mem[v+1]$.
682\item is a constant symbol.
683\item is a variable reference, where $v$ indicates the position in the
684 environment of the referenced variable.
685\item is a constant closure, that is the result of evaluating a lambda
686 abstraction. Values of this type do not normally appear in the input
687 code, but instead are created during evaluation of other forms.
688\item is a procedure (lambda abstraction), whose body is $mem[v]$.
689\item is a conditional, where $mem[v+1]$ is the condition,
690 $mem[memval[v]+1]$ is the ``then'' branch, and $mem[memval[v]]$ is the
691 ``else'' branch.
692\item is a procedure call, which points to a special kind of linked
693 list. The CDR of each cell (other than the last one) points to the
694 next cell and has type 0, while the CDR of the last cell has value 0
695 and its type indicates the primitive to be called. The CARs of these
696 cells indicate the arguments for the procedure, in reverse order.
697
698 To call a user-defined procedure, we use the primitive
699 \texttt{funcall} and put the user-defined procedure as the CAR of
700 the last cell (in other words, the user-defined procedure is the
701 first argument to \texttt{funcall}).
702\item is a quoted constant, representing the object at $mem[v]$.
703\end{enumerate}
704
705The booleans \texttt{NIL} and \texttt{T} are compiled to a list
706pointing to 0, and a symbol with value 2 respectively.
707
708The memory has a peculiar fixed layout. Locations 0 and 1 are the CDR
709and CAR of NIL, locations 2 and 3 are the CDR and CAR of T, location 4
710holds the first unused location in memory (this tells the processor
711where it can start to place intermediate expressions), location 5
712holds the expression to be evaluated, and location 6 is where the
713processor will place the value of the expression.
714
715\subsection{The intermediate form}
716To facilitate conversion from the former format to the latter, an
717intermediate form is used. Here we represent tagged pointers as lists
718of one of three types:
719
720\begin{itemize}
721\item \texttt{(type value)}, where ``type'' is the pointer's type and
722 ``value'' is the pointer's (numeric) value.
723\item \texttt{(type ptr)}, where the pointer's type is ``type'' and
724 the pointer's value is $x$, where $x$ is the memory location where
725 ``ptr'' (another tagged pointer) lies.
726\item \texttt{(type ptr1 ptr2)}, where the pointer's type is ``type''
727 and the pointer's value is $x$, where $x$ is the memory location
728 where ``ptr1'' lies and $x+1$ is the memory location where ``ptr2''
729 lies.
730\end{itemize}
731
732These three representations map nicely to the tagged pointers
733described in the previous section. The first representation is used
734for constant symbols and variable references, the second for
735procedures, and the third for constant lists, conditionals, and
736procedure calls.
737
738A visual representation of the intermediate representation (from the
739AIM-514 paper) can be seen on the next page. The expression in
740question is
741
742\begin{lstlisting}
743((lambda append (x y) (if (atom x) y (cons (car x) (append (cdr x) y)))) '(a b c) '(d e f))
744\end{lstlisting}
745
746which we would represent as this sexp:
747
748\begin{lstlisting}
749(CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3))) (VAR -2)) (CALL (ATOM 0) (VAR -3))))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
750\end{lstlisting}
751
752As mentioned in the Background chapter, the processor has a bug (the
753ATOM bug) that causes \texttt{(ATOM NIL)} to be true. As a result,
754\texttt{(atom x)} in this expression is always true, even when
755\texttt{x} is the empty list, and the expression loops forever (or at
756least until memory is exhausted). We can fix this by changing the
757condition from \texttt{(atom x)} to \texttt{x}, which is false when
758\texttt{x} is NIL and true otherwise, and swapping the two branches of
759the conditional. The fixed program is:
760
761\begin{lstlisting}
762((lambda append (x y) (if x (cons (car x) (append (cdr x) y)) y)) '(a b c) '(d e f))
763\end{lstlisting}
764
2dbe6ed8 765which is compiled to the slightly shorter sexp:
e7b86bf0
MG
766
767\begin{lstlisting}
768(CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (VAR -2) (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3)))) (VAR -3)))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
769\end{lstlisting}
770
771
772\begin{figure}
773\centering\includegraphics[width=18.5cm]{AIM-514-ir.pdf}
774\caption{Example of intermediate representation}
775\end{figure}
776
777\section{Software}
778\subsection{Compiler}
779The compiler takes in Lisp code and translates it to a sexp-based
780format understood by the assembler.
781
782The compiler has several functions for compiling certain kinds of
783expressions, and a function (\texttt{process\_toplevel}) that uses the
784former to compile arbitrary expressions.
785
786The output from any of this functions is a sexp in the intermediate
787form described in the previous section.
788
789Symbols are identified by sequences of letters on the source code side
790and by numbers on the processor side. The compiler therefore keeps a
791mapping between symbol names and their numbers. Whenever a new symbol
792is encountered, a new entry is added to this map.
793
794The \texttt{process\_quoted} function handles constants. For example
795if the source code contains \texttt{'(1 2 3)} then
796\texttt{process\_quoted} is called with \texttt{(1 2 3)}. This
797function handles the symbol interning described in the previous
798paragraph, and converts its input to intermediate form.
799
800The \texttt{process\_proc} function handles lambda abstractions. For
801example if the source code contains \texttt{(lambda first (a b) a)}
802then \texttt{process\_proc} is called with \texttt{first},
803\texttt{(a b)}, and \texttt{a}. This function appends the function
804name and arguments to the environment, then processes the body (using
805\texttt{process\_toplevel}) with the new environment.
806
807The \texttt{process\_funcall} function handles function calls. For
808example if the source code contains \texttt{(reverse-list 2 1)} then
809\texttt{process\_funcall} is called with \texttt{reverse-list}, and
810\texttt{(2 1)}. The arguments are first processed (using
811\texttt{process\_toplevel}), then (if the function is not a primitive)
812the function is processed as well.
813
814Finally the \texttt{process\_toplevel} handles any expression by
815figuring out what sort of expression it is and then processing it
816appropriately. Empty lists and \texttt{NIL} become \texttt{(list 0)},
817numbers and \texttt{T} are processed with \texttt{process\_quoted}
818
819Examples of compiler operation can be found in the compiler tests.
820Here are a few of them:
821
822\begin{itemize}
823\item The Lisp expression \texttt{'()} is compiled to \texttt{(LIST 0)}
824\item The Lisp expression \texttt{'(5 foo)} is compiled to
825 \texttt{(LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4))}. Observe that
826 the CDR comes before the CAR, and symbols are interned:
827 \texttt{'foo} is compiled to \texttt{(SYMBOL 3)} and \texttt{'5} is
828 compiled to \texttt{(SYMBOL 4)}
829\item The Lisp expression \texttt{(if t '(2 3) 'x)} is compiled to
830 \texttt{(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0) (SYMBOL 3))
831 (SYMBOL 4))) (SYMBOL 2))}. Observe that \texttt{t} is always
832 compiled to \texttt{(SYMBOL 2)}, as mentioned in the section about
833 the memory format.
834\end{itemize}
835
836\subsection{Assembler}
837The assembler takes in an S-expression in the intermediate form, lays
838out the words in memory, and produces a memory dump that can be
839supplied to the processor.
840
841The core of the assembler is a function called \texttt{process} that
842takes an expression in intermediate form and an optional location in
843memory. This function lays out the expression's subexpressions in
844memory and then lays out the expression itself at the given location
845(or at the end of the memory if there is no location given).
846
847The assembler begins by initializing the first 7 elements of the
848memory with the values needed by the processor. After initialization,
849the \texttt{process} function is called on the input expression with
850no location.
851
852Here is how the \texttt{process} function lays out the expression
853$expr$ at location $l$ (where $l$ can be a given memory location or
854simply ``the first free location in memory''):
855
856\begin{itemize}
857\item If $expr$ is \texttt{(type value)}, then it is directly laid out
858 at $l$
859\item If $expr$ is \texttt{(type ptr)}, then \texttt{process} is
860 called recursively to lay \texttt{ptr} out at the first free
861 location in memory (say this is location $loc$). Then \texttt{(type
862 $loc$)} is laid out at $l$.
863\item If $expr$ is \texttt{(type ptr1 ptr2)}, then two consecutive
864 free locations in memory $loc$ and $loc+1$ are reserved,
865 \texttt{process} is recursively called twice to lay \texttt{ptr1} at
866 $loc$ and \texttt{ptr2} at $loc+1$, and finally
867 \texttt{(type $loc$)} is laid out at $l$.
868\end{itemize}
869
870Once again, examples of assembler operations can be found in the
871assembler tests. Two of them:
872
873\begin{itemize}
874\item \texttt{(quoted (symbol 5))} is assembled to \texttt{00 08 00 00
875 00 00 20 00 20 00 00 08 E0 07 00 00 20 05}. Observe that each word
876 is made of two bytes, the first word (\texttt{00 08}) indicating the
877 length (in words) of the program. The first 4 words are always
878 fixed: they represent in order the CDR of NIL, the CAR of NIL, the
879 CDR of T, and the CAR of T. The 7th word is also a constant, 0,
880 because that is where the processor puts the result. The 5th word is
881 the first free word in memory (which is equal to the length in words
882 of the program), and the 6th word (\texttt{E0 07}) is the expression
883 to be evaluated. In this case, it has type quote and points to
884 memory location 7 (the 8th word, because pointers are 0-indexed). At
885 memory location 7 we have the value \texttt{20 05} which has type
886 symbol and value 5.
887\item \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} is
888 assembled to \texttt{00 0C 00 00 00 00 20 00 20 00 00 0C C0 07 00 00
889 00 09 20 05 E0 00 80 0B 5F FE}. Here we want to evaulate the
890 expression \texttt{C0 07} (type ``function call'', address 7). This
891 points to a cons-cell, whose car (at position 8) is \texttt{20 05}
892 (that is, \texttt{(number 5)}) and whose cdr (at position 7) is
893 \texttt{00 09} (type ``more'' and address 9). The ``more'' points to
894 another cons-cell, whose cdr (at position 9) is \texttt{E0 00} (that
895 is, \texttt{(funcall 0)}), and whose car is \texttt{80 0B}: a
896 procedure whose body is at position 11, namely \texttt{5F FE} (that
897 is, \texttt{(var -2)}).
898
899 So the expression is a call to the primitive \texttt{funcall} with
900 arguments \texttt{(lambda l (x) x)} and \texttt{5}. The original
901 program was probably \texttt{((lambda l (x) x) 5)}. Note that this
902 program was compiled with an old version of the compiler that did
903 not intern symbols, which explains why \texttt{5} was compiled to
904 \texttt{(number 5)} and not \texttt{(symbol 3)}.
905\end{itemize}
906
907\subsection{REPL}
908The REPL ties everything together: it reads a program from the user,
909compiles it, assembles it, sends the assembled program to the
910processor, reads the memory dump from the processor, and finally
911prints the result to the screen.
912
913The REPL includes an S-expression pretty-printer that operates on
914memory dumps. As symbols are interned by the compiler, when the
915pretty-printer hits a symbol it only knows its index, and not the
916original name. To print the original name the pretty-printer searches
917the symbol map built by the compiler and pulls the name from there.
918
919The pretty-printer takes a chunk of memory and index, and prints the
920expression at that index. For example, if the memory chunk is
921\texttt{00 00 00 00 20 00 20 00 00 09 00 07 00 07 20 02 00 00 60 00}
922and the index is 6, then the pretty printer will try to print
923\texttt{00 07}, which is a list with car \texttt{00 00} and cdr
924\texttt{20 02}, and the pretty printer prints \texttt{(nil . t)}.
925
e7b86bf0
MG
926An example REPL session is:
927
928\begin{lstlisting}
929YULE REPL
930
931* (cons 1 2)
932(1 . 2)
933* (car '(4 5 6))
9344
935* (cdr '(t t t))
936(T T)
937* ((lambda id (x) x) 5)
9385
939* (lambda first (a b) a)
940<CLOSURE>
941* ((lambda first (a b) a) 7 8)
9427
943* (atom 6)
944T
945\end{lstlisting}
946
947\chapter{Evaluation and Testing}
948To make development easier, ensure everything works correctly, and
949prevent regressions there are unit tests for the compiler and
950assembler, and functional tests for the entire project.
951
952Besides these automated tests, manual evaluation (in the beginning,
953precompiling simple expressions and supplying them to the processor as
954the initial value of the memory; later on, manually typing expressions
955in the REPL and looking at the output). When the processor was not
956operating as expected, the GTKWave wave simulator was used to assist
957with debugging. This piece of software takes in a recorded simulation
958of the circuits and displays the values of all signals inside the
959circuit against time. This allows the designer to look at every clock
960cycle in turn and discover where the error occurs.
961
962The process of debugging the processor had the side effect of finding
963inefficiencies in the original design. While \cite{lambda} mentions
964some limitations of the processors, one important limitation that was
965only discovered during debugging is that for every operation that the
966EVAL performs, the result of that operation will be written to new
967locations in memory. This includes ``read-only'' operations such as
968CAR and CDR. If the programmer requests the CAR of a list, a copy of
969the first element will be taken and a pointer to it will be returned.
970Since YULE only has 4096 words of memory, this severely restricts the
971complexity of programs that can be evaluated on it.
972
973Another avenue to ensure the processor works as it should is formal
974verification using model checking tools. Formal verification is
975commonly employed as part of circuit design evaluation because
976mistakes are more difficult to identify, and if found after the chip
977has been built they are very costly to fix (often the chips that were
978built with the bug need to be thrown away). However, building models
979is a very involved process, and devoting effort to formal verification
980of this project would not have allowed the implementation of several
981features. Due to time constraints and knowing that no physical YULE
982chips are built (the project is so far only synthesised to a FPGA),
983testing and manual evaluation was considered sufficient for YULE.
984
985\section{Unit testing}
986The software was tested with 48 unit tests written in Perl using the
987\texttt{Test::More} framework.
988
989The assembler takes in compiled Lisp code and outputs a memory dump in
990one of two formats (binary or Verilog code). Its tests take a
991configuration for the assembler, an input, two expected outputs, and
992(for documentation purposes) the source Lisp code. An example test:
993
994\begin{lstlisting}
995$expbin =
996 '00 0C 00 00 00 00 20 00 20 00 00 0C C0 07'
997 . ' 00 00 00 09 20 05 E0 00 80 0B 5F FE';
998run_test {addr_bits => 13},
999 '(call (more (funcall 0) (proc (var -2))) (number 5))',
1000 <<'EOF', $expbin, '((LAMBDA ID (X) X) 5)';
1001mem[ 0] <= 0; // (cdr part of NIL)
1002mem[ 1] <= 0; // (car part of NIL)
1003mem[ 2] <= 16'b0010000000000000; // (cdr part of T)
1004mem[ 3] <= 16'b0010000000000000; // (car part of T)
1005mem[ 4] <= 16'd12; // (free storage pointer)
1006mem[ 5] <= 16'b1100000000000111; // CALL 7
1007mem[ 6] <= 0; // (result of computation)
1008mem[ 7] <= 16'b0000000000001001; // MORE 9
1009mem[ 8] <= 16'b0010000000000101; // NUMBER 5
1010mem[ 9] <= 16'b1110000000000000; // FUNCALL 0
1011mem[10] <= 16'b1000000000001011; // PROC 11
1012mem[11] <= 16'b0101111111111110; // VAR -2
1013EOF
1014\end{lstlisting}
1015
1016Here the compiled code \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))}
1017is given to the assembler, and its outputs are compared to the binary
1018string on the first line and the block of Verilog code.
1019
1020The tests do not all contain valid assembler inputs. There are a
1021series of tests that give invalid inputs to the assembler and expect
1022the assembler to raise an exception matching a pattern. Example tests:
1023
1024\begin{lstlisting}
1025expect_error_like { run_test {}, '((type is a list) 5)'}
1026 'Type of toplevel is not atom';
1027expect_error_like { run_test {}, '(badtype 5)'}
1028 'No such type';
1029expect_error_like { run_test {}, '(5 700000)'}
1030 'Addr too large';
1031\end{lstlisting}
1032
1033On the compiler side, there are 4 tests for the quoting function,
1034but the brunt of the tests are pairs of a compiler input (i.e. Lisp
1035code) and a compiler output (i.e. a sexp that the assembler
1036understands). For each pair the compiler is run on the first element
1037and its output is compared with the second. Example such tests:
1038
1039\begin{lstlisting}
1040is_toplevel '(quote 5)', '(SYMBOL 3)';
1041is_toplevel '(reverse-list \'a \'a \'b)',
1042 '(CALL (MORE (MORE (REVERSE-LIST 0) (SYMBOL 4)) (SYMBOL 3)) (SYMBOL 3))';
1043is_toplevel '(if t \'(2 3) \'x)',
1044 '(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0)'
1045 . ' (SYMBOL 3)) (SYMBOL 4))) (SYMBOL 2))';
1046\end{lstlisting}
1047
1048As the compiler code includes a function to pretty-print
1049S-expressions, there are a few round-trip tests on this function.
1050Given a string, it is parsed using the \texttt{Data::SExpression}
1051parsing library then it is pretty-printed. The result should match the
1052original string.
1053
1054There are also tests for handling of invalid input, like in the assembler. These include:
1055
1056\begin{lstlisting}
1057expect_error_like { is_toplevel 'x' } 'Variable x not in environment';
1058expect_error_like { is_toplevel '(car)' } 'Cannot call primitive car with no arguments';
1059\end{lstlisting}
1060
1061A test coverage tool, \texttt{Devel::Cover}, was ran on the compiler
1062and assembler to find statements and branches that were not covered by
1063existing unit tests, and new tests were written to cover them. Right
1064now \texttt{Devel::Cover} reports 100\% test coverage of the compiler
1065and assembler.
1066
1067\begin{figure}
1068 \centering\includegraphics[height=8cm]{devel-cover.png}
1069 \caption{Screenshot of \texttt{Devel::Cover} results, showing 100\%
1070 test coverage}
1071\end{figure}
1072
1073\section{Functional testing}
1074Besides unit tests, the entire system (hardware and software) has
1075undergone functional testing. This means that a set of 21 Lisp
1076expressions has been prepared, alongside their correct values computed
1077using a software Lisp implementation. These values are given to the
1078YULE REPL and the test succeeds if the output equals the known correct
1079value of the expression.
1080
1081Here are some example tests:
1082\begin{lstlisting}
1083'(1 2 3)
10845
1085((lambda id (x) x) 5)
1086((lambda id (x) x) '(1 2 3))
1087(car '(2 3))
1088(car (cdr '(4 5 6)))
1089((lambda snd (x y) y) 'first 'second)
1090
1091(cons '(1 2) '(7))
1092(car (cons '(1 2) '(7)))
1093(cdr (cons '(1 2) '(7)))
1094
1095((lambda rev (xs) ((lambda reverse-acc (xs acc) (if xs (reverse-acc (cdr xs) (cons (car xs) acc)) acc)) xs '())) '(4 5 6 7))
1096\end{lstlisting}
1097
1098This test suite tries to cover most kinds of valid expressions. It
1099includes constants of all kinds, conditionals, lambda abstractions,
1100calls to all primitives, calls to user-defined functions, recursive
1101calls.
1102
1103\section{Timing and performance}
1104There are three parts to the performance of YULE:
1105
1106\begin{enumerate}
1107\item The efficiency of the software. This refers to the time taken to
1108 compile and assemble input programs, which is negligible.
1109
1110\item The I/O performance, which refers to the time needed to transfer
1111 the program from the computer to the processor's memory, plus the
1112 time needed to transfer the memory dump from the memory back to the
1113 computer.
1114
1115\item The speed of the processor itself, that is the time/number of
1116 cycles that the processor needs to evaluate a given expression.
1117\end{enumerate}
1118
1119The GC runs at 48MHz, while the EVAL only runs when the GC allows it
1120to which is at most half of the time.
1121
1122The UART runs at 4000000 baud, so a bit is sent every 12 cycles of the
112348MHz clock. A byte is made of 11 bits: a start bit, 8 data bits, and
1124one stop bit. A word is two bytes, so 22 bits. Therefore 264 cycles
1125are needed to transmit one word.
1126
1127Since the WRITER just sends a full memory dump, the I/O performance is
1128proportional not to \texttt{program\_len + result\_len} but to
1129\texttt{program\_len + result\_len + intermediate\_results\_len}. As
1130evaluation of any expression involves the creation of at least one
1131value (the result), for every expression or subexpression evaluated at
1132least 264 cycles will be needed to output that result. The EVAL takes
1133significantly less than 264 cycles to evaluate a expression (not
1134counting its subexpressions), in one test each expression took between
113530 and 90 cycles to evaluate. The processor performance is therefore
1136dominated by I/O.
1137
1138As an example, the program \texttt{((lambda id (x) x) 5)} is 12 bytes
1139long, and takes 258 cycles to evaluate. The memory dump is 54 bytes
1140long. Therefore of the roughly 17700 cycles needed from beginning to
1141end, about 98.5\% are used to do I/O.
1142
1143\section{A worked example}
1144\label{sec:example}
1145Here is an example of what happens when a expression is typed into the
1146REPL:
1147
1148\begin{enumerate}
1149\item A user types into the REPL the string
1150
1151\begin{lstlisting}
1152(car '(2 3 4))
1153\end{lstlisting}
1154
1155 This is a call to the CAR primitive, with the list \texttt{(2 3 4)}
1156 as an argument. We expect the result to be \texttt{2}, the first
1157 element of the list.
1158
1159\item This string is given to the compiler, producing the sexp
1160
1161\begin{lstlisting}
1162(CALL (CAR 0) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
1163\end{lstlisting}
1164
1165 Observe that in this representation the CDR comes before the CAR. So
1166 for example ``(LIST (LIST 0) (SYMBOL 3))'' represents a list whose
1167 CAR is ``(SYMBOL 3)'' and CDR is ``(LIST 0)'' (that is, nil). This
1168 is a list with one element, ``(SYMBOL 3)''. The sexp shown here
1169 represents a call to the CAR primitive with one argument. The list
1170 has three elements, which are in order \texttt{(SYMBOL 5), (SYMBOL
1171 4), (SYMBOL 3)}.
1172
1173 In this stage the compiler maps each number to a symbol, in this
1174 case \texttt{(SYMBOL 5)} is 2, \texttt{(SYMBOL 4)} is 3, and
1175 \texttt{(SYMBOL 3)} is 2. This mapping is remembered so it can be
1176 used later by the pretty printer.
1177
1178\item This sexp is given to the assembler, producing the 16 words long
1179 binary string (here displayed in hex, with a space after each 16-bit
1180 word)
1181
1182\begin{lstlisting}
1183000F 0000 0000 2000 2000 000F C007 0000 2000 0009 000B 2005 000D 2004 0000 2003
1184\end{lstlisting}
1185
1186 The first word, \texttt{000F}, indicates the length of the program
1187 in
1188 words (15 words, this does not include this first word).\\
1189 The next four words, \texttt{0000 0000 2000 2000}, represent the CDR
1190 and CAR of NIL and T respectively. These words are always
1191 constant.\\
1192 The next word, \texttt{000F}, represents the first unused position
1193 in memory. This is always equal to the first word transmitted.\\
1194 The next word, \texttt{C007}, represents the expression to be
1195 evaluated.\\
1196 The next word, \texttt{0000}, is ignored as this is where the
1197 processor will place the result of the evaluation.\\
1198 The rest of the words are subexpressions or parts of subexpressions
1199 of the expression being evaluated.\\
1200
1201 The expression that is evaluated is \texttt{C007}, which is a CALL
1202 whose CDR is at position 7 (\texttt{2000}, meaning \texttt{(CAR 0)})
1203 and whose CAR is at
1204 position 8 (\texttt{0009}).\\
1205 \texttt{0009} is a list with CAR at 10 (\texttt{2005}, meaning
1206 \texttt{(SYMBOL 5)}) and CDR at 9 (\texttt{000B}).\\
1207 \texttt{000B} is a list with CAR at 12 (\texttt{2004}, meaning
1208 \texttt{(SYMBOL 4)}) and CDR at 11 (\texttt{000D}).\\
1209 \texttt{000D} is a list with CAR at 14 (\texttt{2003}, meaning
1210 \texttt{(SYMBOL 3)}) and CDR at 13 (\texttt{0000}, meaning NIL).
1211
1212\item This binary string is sent via UART to the FPGA's READER. The
1213 READER writes the string (except the first word) into memory,
1214 overwriting the previous contents of the memory. It then reports to
1215 the controller that it is done, and the controller advances to
1216 \texttt{STATE\_RUN}.
1217
1218\item The EVAL and GC evaluate the program in memory, and then place
1219 the result back into memory. The EVAL reports to the controller that
1220 it is done, and the controller advances to \texttt{STATE\_WRITE}.
1221
1222\item The WRITER sends this memory dump via UART back to the REPL
1223
1224\begin{lstlisting}
1225000F C007 2005 2000 0009 000B 2005 000D 2004 0000 2003 0000 4004 2000 0010 0000 0012 0000 0000 0009
1226\end{lstlisting}
1227
1228 Then the WRITER reports to the controller that it is done, and the
1229 controller advances to \texttt{STATE\_READ}.
1230
1231 Observe that this memory dump is similar to the assembler output.
1232 The first two words are the 6th and 7th words in the assembler
1233 output, the next word is the result of the evaluation, then the next
1234 8 words are words 9th through 16th in the assembler output. The
1235 following words are intermediate results computed by the processor.
1236
2dbe6ed8 1237\item The REPL's pretty printer interprets this string starting with
e7b86bf0
MG
1238 \texttt{2005}, which means \texttt{(SYMBOL 5)}. The pretty printer
1239 looks in the compiler's mapping and finds that the 5th symbol was
67d929ce 1240 2, and so the REPL prints
e7b86bf0
MG
1241
1242\begin{lstlisting}
67d929ce 12432
e7b86bf0
MG
1244*
1245\end{lstlisting}
1246
1247 The evaluation is complete, and a new prompt is printed so that the
1248 user can input another expression.
1249\end{enumerate}
1250
67d929ce
MG
1251\begin{figure}
1252 \centering\includegraphics[height=7cm]{repl.png}
1253 \caption{What the user types and sees in this example}
1254\end{figure}
1255
e7b86bf0
MG
1256\chapter{Conclusions}
1257A Lisp machine processor has been built using the original design and
1258techniques presented in \cite{lambda}. It has been extended with input
1259and output capabilities, and using a modern FPGA it has been
1260synthesised on a physical device slightly larger than a USB flash
1261drive. This device can be used as a standalone Lisp evaluator, or the
1262processor inside it can be included in a more complicated circuit.
1263
1264Software has been written to help interact with the processor. This
1265includes a compiler, an assembler, and a REPL that together make it
1266possible for an end-user unfamiliar with hardware design or the
1267processor's internals to submit Lisp expressions to the processor and
1268get back their values. The software also allows programmers to convert
1269Lisp source code to the binary format understood by the processor, so
1270that they may later send the binaries to the processor for evaluation.
1271
1272The processor and software have been tested to ensure they work
1273correctly. This has uncovered bugs and inefficiencies in the original
1274paper, some of which could not be gleaned from simply reading the
1275paper.
1276
1277The source files for the hardware and software, and also the automated
1278tests have been released under an open-source license to make it easy
1279for future circuit designers and programmers to use and extend this
1280project. They can be found at \url{https://git.ieval.ro/?p=yule.git}.
1281
1282The original paper \cite{lambda} is one of the influential Lambda
1283Papers that defined the Scheme programming language and is an
2dbe6ed8 1284important historical artifact for understanding how Lisp machine
e7b86bf0
MG
1285processors can be implemented and more generally how tagged
1286architectures work. But all details of the processor cannot be
1287understood from simply reading the paper, and most readers would not
1288follow by attempting to implement what is described there. This
1289projects brings a tangible implementation into the open, which shows
1290the limitations of the design to people interested in this design and
1291makes it possible to solve this limitations and extend the processor
1292beyond its original scope.
1293
1294\section{Reflections}
1295Now that the project is finished, it is worth revisiting ideas I had
1296in the beginning that were not ultimately useful.
1297
1298The original concept had the processor run on a typical development
1299board, with the software running on a Raspberry PI. The board and the
1300Raspberry PI would be connected through jumper wires, and the UART
1301circuitry on the Raspberry PI would be used by the software to talk to
1302the processor.
1303
1304This idea would have produced a much harder to use YULE. Not only
1305would the user have to ssh into a Raspberry PI to do anything useful,
1306but also the equipment would take more space (both boards are larger
1307than the iCEstick). Once I tried the iCEstick and found that
1308everything can be done just as well with it, it replaced the two
1309boards and made the project easier to use (albeit slightly harder to
1310debug, as the iCEstick only has 5 LEDs).
1311
1312Some code had to be rewritten, sometimes multiple times. The compiler
1313was originally written in Common Lisp, as it is the best suited
1314language for manipulation of S-expressions, especially Lisp code. This
1315meant a working project was obtained more quickly, but then the
1316compiler had to be rewritten in another language; if the user has a
1317software Lisp implementation already, why would they need YULE? I
1318chose Perl as the reimplementation language, as the assembler was
1319already written in Perl and it is widely-available, most systems
1320coming with a Perl interpreter out of the box.
1321
1322Throughout development of the hardware part, timing problems were a
1323mainstay. The first ones related to the interactions between the
1324processor and the memory (the memory was clocked, but the processor
1325expected a combinational RAM). What fixed this problem was to run the
1326memory at twice the speed of the rest of the processor. At some point,
1327a decision was taken to run the GC and EVAL on different clocks (both
1328ran at one quarter the speed of the master clock, but one of the
1329clocks was delayed by half a cycle. The positive edge on one clock
1330would happen midway between a positive edge and a negative edge on the
1331other one). This proved to be unwise, as sometimes values sent by one
1332module to the other would be cleared before the destination module
1333managed to read them.
1334
1335GTKWave proved to be very useful in debugging these timing issues, as
1336the exact point where the issue began was easy to identify and I could
1337see why the issue was happening. A workaround for this was to add a
1338latch on each side of the common E bus that remembered the value sent
1339long enough for the receiving module to see it.
1340
1341All timing problems were fixed more cleanly in the end when an attempt
1342to simplify the clocks was made. First GC and EVAL were changed to use
1343the same clock, which removed the need for the latches, and then the
1344memory was changed to use the negative edge of the clock instead of
1345the positive edge, which meant no more clock division code had to be
1346used.
1347
1348\section{Future work}
1349There are several ways to extend this project.
1350
1351One idea is to fix the noted limitations: the ATOM primitive considers
1352NIL to not be an atom, and there is no EQ primitive that compares
1353symbols/other objects. The latter problem means that all symbols are
1354indistinguishable (note: NIL can still be distinguished from T and
1355other symbols, but NIL is not a symbol). To introduce equality a new
1356signal needs to be added to the EVAL that compares (for example) the
1357value on the E bus with the value of a fixed register, possibly a
1358newly added one. Then (EQ A B) would store A in the register, load B
1359on the E bus and compare them.
1360
1361Once equality is implemented, it becomes possible to implement the
1362EVAL function that evaluates an arbitrary Lisp expression.
1363
1364Two more involved instruction set improvements are adding arithmetic
1365operations, and input/output. Without arithmetic primitives one has to
1366rely on Church numerals, which encodes numbers as lambda abstractions
1367and is very inefficient. Input/output primitives would allow programs
1368to receive data at runtime over the UART and send data back before
1369evaluation is finished. This enables long-running useful programs, for
1370example with input/output and equality a REPL can be written (which
1371relies on the EVAl function mentioned above).
1372
1373On the software side, the assembler can support optimizations and then
1374the REPL can be extended with a core image. The main optimization
1375needed in the assembler is common subexpression elimination. The
1376processor never overwrites a value in memory with another value (other
1377than overwriting \texttt{mem[6]} with the result of the evaluation),
1378so it is safe to share any identical subexpressions. For example, if
1379the user inputs the program \texttt{(cons (car '(1 2 3 4)) (cdr '(0 2
1380 3 4)))}, then when the assembler should share the tails of the two
1381lists, meaning the second list should compile to \texttt{(LIST (LIST
1382 X) (SYMBOL Y))} where X is a pointer to where the second element of
1383the first list is stored. This makes long programs significantly
1384shorter. Note that no compiler changes are required to implement this,
1385as the assembler can just remember all expressions and subexpressions
1386it has assembler so far including their locations, and whenever a new
1387expression needs to be assembler it can be replaced with a pointer to
1388where the same expression has been placed before.
1389
1390Once this optimization is implemented, the REPL can gain core image
1391support. Suppose the user enters the expression \texttt{expr}. Instead
1392of sending \texttt{assemble(compile(expr))} to the processor, the REPL
1393can remember the last expression sent (or the last memory dump
1394received), then append \texttt{assemble(compile(expr))} to that
1395previous expression. This way \texttt{expr} can refer to previously
1396defined expressions, as long as the REPL or compiler has a way to name
1397previously supplied instructions, for example via a \texttt{define}
1398special form, via the standard REPL variables (\texttt{+, ++, +++}),
1399or by sequentially numbering all expressions submitted by the user.
1400
1401Returning to hardware, a potential future project is to add actual
1402hardware garbage collection to the GC component. At the moment the
1403design does no garbage collection, the processor only has 4096 words
1404of RAM available, and evaluation is quite memory-inefficient. Adding
1405garbage collection would make more complicated programs possible
1406despite the limited RAM. One way to implement this is to add a small
1407microprocessor to the design which is preprogrammed with a garbage
1408collection algorithm (simple choices are a stop-and-copy collector or
1409Cheney's algorithm). When the free space pointer gets too high, the
1410EVAL and GC are stopped, and the new microprocessor is started up so
1411it can perform garbage collection, compact the memory, and update the
1412free space pointer. The EVAL/GC registers would have to be updated as
1413well. Then the microprocessor can be stopped, EVAL and GC be started
1414up again and evaluation can continue. Note that a non-moving collector
1415is not sufficient: the GC allocates new words only at the very end of
1416the memory, so it cannot benefit from free ``gaps'' in the middle of
1417the memory.
1418
1419\begin{thebibliography}{99}
1420\bibitem{lambda} Design of LISP-Based Processors (1979), available at
1421 \url{http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-514.pdf}
1422\bibitem{osdvu} \url{https://opencores.org/project/osdvu}
1423\bibitem{gtkwave} \url{http://gtkwave.sourceforge.net/}
1424\bibitem{icestorm} \url{http://www.clifford.at/icestorm/}
1425\end{thebibliography}
1426\end{document}
This page took 0.080877 seconds and 4 git commands to generate.