Add diagrams and pictures
[clump.git] / report.tex
1 \documentclass[a4paper,12pt]{report}
2 \usepackage{color}
3 \usepackage[cm]{fullpage}
4 \usepackage{graphicx}
5 \usepackage{hyperref}
6 \usepackage{listings}
7 \usepackage{pdfpages}
8
9 \title{YULE: Yet Unnamed Lisp Evaluator}
10 \author{Marius Gavrilescu}
11 \date{May 2018}
12
13 \begin{document}
14 % title page does not need anchors
15 \hypersetup{pageanchor=false}
16 \maketitle
17 \hypersetup{pageanchor=true}
18 \lstset{language=Perl, frame=leftline, framexleftmargin=5pt,
19 showstringspaces=false, showspaces=false, breaklines=true,
20 postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space}}
21
22 \chapter*{Abstract}
23 Lisp machines are computers designed to run Lisp programs efficiently.
24 They were first developed in the 1970s, and they have special hardware
25 support for Lisp. This usually comprises a tagged architecture (each
26 pointer has a ``type'' field), garbage collection implemented in
27 hardware, and an instruction set that lends itself well to evaluating
28 Lisp expressions.
29
30 Between 1975 and 1980, Guy L. Steele and Gerald Jay Sussman wrote a
31 series of memos on lambda calculus, the Lisp language and its
32 implementation known collectively as the ``Lambda Papers''. One of
33 these papers, ``Design of LISP-Based Processors (1979)'' describes how
34 to implement a Lisp machine processor. If this processor is connected
35 to a RAM that contains a program, it executes that program and places
36 the result back into the RAM.
37
38 The aim of YULE is to take the design from the 1979, implement it
39 today using modern technology and techniques, and bundle it up in a
40 form that is both easy to use and extend. By going through this
41 modernisation process, future developers can easily integrate the Lisp
42 processor into their own design, modifying it as they please.
43
44 The end result of YULE is a small device that can be plugged into a
45 USB port, and an associated computer program. Running the program
46 presents a simplified Lisp listener. The user can type a Lisp
47 expression into the listener, and the listener will respond with the
48 value of the expression. Under the hood, the listener evaluates Lisp
49 by sending the expression over USB to the small device and getting
50 back the result.
51
52 YULE comes with the design files for the hardware, and source code for
53 the software. All of these are released under a free software license
54 and published online, which makes it very easy (from both a practical
55 and a legal standpoint) for future developers to use in their own
56 projects and modify to fit their purposes better.
57 \tableofcontents
58
59 \chapter{Introduction}
60 \section{Motivation}
61 Lisp machine processors are quite unlike modern processors. Modern
62 instruction sets are designed for imperative programming. Each
63 instruction makes the processor execute some operation, such as
64 loading a register or storing a value in memory. In contrast, a Lisp
65 machine processor reads an expression from memory, determines its
66 type, and recursively evaluates it. This operation lends itself to
67 declarative programming, where the programmer declares an arbitrarily
68 complicated expression and asks the computer to evaluate it.
69
70 Another common feature of Lisp machine processors is the use of tagged
71 pointers. Each memory location is a tagged union of two elements: the
72 value itself, and a tag representing the type of that data. Tagged
73 architectures are not common today, although some programs still
74 use tagged pointers by exploiting pointer alignment (if every pointer
75 is a multiple of 8, then the bottom 3 bits can be used to store a
76 tag). These tags can be used to store an object's reference count (as
77 in Objective-C on certain versions of iOS), or information about a
78 data structure (as in certain implementations of kd-trees).
79
80 The ``Design of LISP-Based Processors (1979)'' paper by Guy L. Steele
81 and Gerald Jay Sussman \cite{lambda} presents the design of a
82 relatively simple Lisp machine processor, which is made from two
83 read-only memories, two buses, and several registers. If this
84 processor is connected to a RAM containing a Lisp expression in a
85 certain binary format, the processor will evaluate the expression and
86 place the result back in memory.
87
88 This processor features both a tagged architecture (the top 3 bits of
89 each word in memory represent the value's type) and the declarative
90 nature mentioned above. It is therefore interesting to study and to
91 bring this processor into today's world by using technology such as
92 hardware description languages (that allow us to describe a circuit
93 formally) and FPGAs (that allow us to synthesise any circuit quickly
94 and at low cost). By building a HDL implementation of the processor
95 not only can it be used easily on its own (by synthesising it to a
96 FPGA and connecting it to RAM), but a future circuit designer can take
97 this processor and include it in their more complicated design.
98
99 The processor presented in the original paper is a somewhat barebones
100 implementation of Lisp. Arithmetic primitives are not available, nor
101 are there any input or output routines. Due to time constraints the
102 paper's writers have not included a equality primitive. Additionally
103 the ATOM primitive has a bug that makes it consider NIL not atomic.
104 All these deficiencies can be rectified given time and effort, but it
105 is difficult for one to fix them given only a paper describing the
106 processor. Having a HDL implementation of the processor with a proper
107 test suite allows a future circuit designer to more easily make
108 changes to the processor while using the test suite to prevent
109 regressions and using a FPGA to quickly obtain a hardware version of
110 the changed processor that can be manually tested and used in any
111 desired application.
112
113 \section{Challenges}
114 While \cite{lambda} is part of the well-known Lambda Papers series and
115 has been read by many over the years, there are no published
116 implementations of the processor described in the paper. Therefore
117 \cite{lambda} was the only source used in the building of this
118 project, and several of the limitations of the processor were not easy
119 to see before the implementation phase was well underway. Furthermore,
120 the original paper has several shortcomings.
121
122 For one, while the microcode (the values of the two read-only
123 memories) is provided in both source and compiled form, the microcode
124 compiler is not provided. This makes it difficult to alter the
125 microcode to fix bugs or introduce new instructions. One needs to
126 either work on the already-compiled code (hard and error-prone) or
127 reimplement the microcode compiler, which takes a reasonable amount of
128 effort.
129
130 Moreover, the paper does not concern itself at all with how the user
131 gets a program into memory or how the user gets the result back. This
132 makes the processor as-described very difficult to use.
133
134 To implement the processor in Verilog, no source other than the paper
135 was used. There were no previous implementations of the processor that
136 could be consulted to resolve unclarities in the original paper.
137
138 In addition, there was no software available for interacting with the
139 processor. A compiler and assembler had to be written from scratch,
140 and so the processor could not be easily debugged until after the
141 software was functional.
142
143 \section{Contribution}
144 YULE is a modern-day implementation of a Lisp machine processor using
145 the ideas and techniques presented in \cite{lambda}.
146
147 The main contribution of YULE is a Verilog implementation of the
148 processor which allows future circuit designers to use it in their own
149 design and modify it as they please.
150
151 As the processor as described has no way to input a program or extract
152 a result from memory, YULE includes more circuitry around the
153 processor that lets it communicate with the external word. This
154 consists of a serial UART and two modules, a reader and a writer. When
155 the processor is idle, the reader waits for a program to be sent over
156 the UART. Once a program has been received, it is placed into memory
157 and the processor starts to evaluate it. Once evaluation is done, the
158 writer transmits the new contents of the memory over the UART.
159
160 Besides the hardware chip, YULE also includes three pieces of
161 software:
162
163 \begin{itemize}
164 \item a compiler, which takes Lisp source code and produces an
165 intermediate representation of the program.
166 \item an assembler, which takes a program in the intermediate
167 representation and produces the binary representation of the program
168 that the processor can understand.
169 \item a REPL, in which users can type expressions. When an expression
170 is submitted, the REPL compiles it, assembles it, and sends it to
171 the processor. The processor evaluates the expression and sends back
172 a memory dump. The REPL then reads the memory dump, interprets it,
173 and prints the value of the expression in a human-readable format.
174 \end{itemize}
175
176 The REPL is therefore the entry point of YULE as a whole. An end-user
177 who does not intend to extend YULE can plug in a USB device containing
178 the YULE processor, and run the REPL. Then the user can write
179 expressions in the REPL and get their values back. Besides being the
180 entry point, the REPL serves as a convenient way to test that the
181 entire system is working. An expression typed in the REPL goes through
182 the compiler and assembler, and then is sent to the processor. It goes
183 through the three stages of the processor (reading the expression into
184 memory, evaluating the contents of the memory, writing a memory dump).
185
186 The hardware and software components of YULE have been tested to
187 ensure they function correctly. There is an automated test suite
188 containing 48 unit tests for the compiler and assembler, and 21
189 functional tests that engage all parts of the system (both the
190 software and the hardware). The unit tests ensure that the various
191 small parts of the compiler and assembler work properly, while the
192 functional tests consist of pairs of short Lisp expressions and their
193 correct values. The expressions are written to the REPL, and the
194 REPL's output is compared to the correct values.
195
196 To make it easier for future circuit designers and programmers to use
197 this project, YULE has been released under an open source license.
198 Prospective users can find all the source files (for both hardware and
199 software) online, and are free to use, modify, and redistribute them
200 under the same terms as Perl itself. The files can be accessed at this
201 URL: \url{https://git.ieval.ro/?p=yule.git}.
202
203 \section{Layout of this report}
204 This report begins by describing the background of YULE, the
205 technologies that made this project possible and a more in-depth
206 description of the 1979 paper. Then the components that YULE comprises
207 are examined in turn, followed by a chapter on how everything was
208 tested, which includes a worked example of the system in action. The
209 report ends with a list of ideas for future work based on YULE, and a
210 section on reflections.
211
212 \chapter{Background}
213
214 \section{Field-programmable gate array}
215 An FPGA is a configurable integrated circuit. A designer can define a
216 circuit using a hardware description language (explained in the next
217 section) and compile it to some binary format called a configuration.
218 This configuration describes how the different components of the FPGA
219 should be connected to each other. The configuration can then be
220 uploaded to the FPGA, and the FPGA will connect its internal
221 components as requested.
222
223 An FPGA can therefore replace any digital custom circuit (ASIC). While
224 an FPGA consumes more energy, uses more space, and runs at slower
225 clock rates than an ASIC, the FPGA permits rapid prototyping: the time
226 between finishing to write a design and having a working part is mere
227 seconds (compilation time + uploading the configuration to the FPGA),
228 and if a bug is discovered the buggy part need not be thrown away. The
229 design can be modified and the fixed configuration can be uploaded to
230 the FPGA.
231
232 The hardware aspect of YULE is designed to run on the Lattice
233 iCE40HX-1k FPGA, but with small modifications it can run on other
234 FPGAs as well.
235
236 \subsection*{iCEstick}
237 \begin{figure}
238 \centering
239 \includegraphics[height=4cm]{icestick.png}
240 \caption{Lattice iCEstick Evaluation Kit}
241 \end{figure}
242
243 The Lattice iCEstick Evaluation Kit is a development board for the
244 iCE40HX-1k FPGA. It contains the FPGA, a USB connector, a FTDI chip
245 and 5 LEDs. The form factor is very convenient. It is similar to a USB
246 memory stick (albeit bigger) that can be plugged into a computer,
247 avoiding the need for jumper wires and GPIO that traditional
248 development boards require.
249
250 The FTDI chip implements the USB protocol, allowing the FPGA to be
251 programmed by the computer and letting the FPGA talk to the computer
252 using a very simple serial UART interface on the FPGA side, and a
253 standard USB serial port on the computer side. This makes
254 communication very easy. The FPGA can use any off-the-shelf UART
255 implementation (this project uses \texttt{osdvu} from
256 Opencores \cite{osdvu}), and the computer can use standard serial
257 console tools such as \texttt{minicom}.
258
259 \section{Verilog}
260 Verilog is a hardware description language. It is used to model
261 digital circuits at register-transfer level of abstraction. In Verilog
262 one defines several modules, with one of them (the ``toplevel''
263 module) representing the entire design. Each module has several inputs
264 and outputs, can contain internal wires, can contain instances of
265 other modules, and can contain combinational and sequential logic
266 blocks that drive internal wires and outputs.
267
268 Verilog has many more complicated features that are not used in YULE.
269 These include tristate buffers, delays, functions, tasks, loops, and
270 strength declarations.
271
272 A subset of Verilog can be \textit{synthesised}; that is converted to
273 a netlist (a list of transistors or FPGA components used and
274 connections between them) that can then be used to build an ASIC
275 (custom integrated circuit), or can be further converted to the
276 internal representation of the FPGA and uploaded to the FPGA. The FPGA
277 workflow and the programs that are part of it are described in the
278 next section.
279
280 A verilog project can also be run through a simulator, which simulates
281 the design clock edge by clock edge and records the values on each
282 wire in the design. This record can later be read in and visualised
283 with a wave viewer, which allows a circuit designer to better
284 understand why the circuit is not working correctly. One such program,
285 GTKWave \cite{gtkwave}, was used to debug YULE.
286
287 \section{icestorm}
288 Project IceStorm \cite{icestorm} is a set of open source reverse
289 engineered tools for programming Lattice iCE40 FPGAs. There exists a
290 fully open source Verilog-to-Bitstream flow which comprises:
291
292 \begin{enumerate}
293 \item Yosys, a Verilog synthetizer.
294 \item Arachne-pnr, a tool that implements placing and routing for
295 iCE40 FPGAs.
296 \item icepack, which converts Arachne-pnr's output to iCE40 bitstream.
297 \item iceprog, which programs the iCEstick with the output of icepack.
298 \end{enumerate}
299
300 \section{Design of LISP-Based Processors (1979)}
301 Design of LISP-Based Processors (1979) \cite{lambda} is one of the
302 ``Lambda papers''; a series of memos by Sussman and Steele about the
303 Lisp language and its implementation. The paper describes SIMPLE, a
304 microprocessor designed to run Lisp.
305
306 This processor is made of two read-only memories called EVAL and GC.
307 Each read-only memory is paired with state registers to form a state
308 machine. The contents of the state registers are fed into the
309 read-only memory, and the output of the read-only memory determines
310 the next state and other signals.
311
312 Each state machine also has an internal bus (called the E and G buses
313 respectively) and several registers connected to that bus. Each of
314 these registers stores a tagged pointer, which is an 11-bit value (the
315 top 3 bits represent the type, and the bottom 8 bytes represent the
316 value).
317
318 The signals coming out of each state machine can cause a register to
319 be read onto the machine's internal bus, a register to be loaded from
320 the internal bus, or write a constant value onto the internal bus.
321
322 The GC state machine represents the processor's memory system. It is
323 connected to an external RAM, which it controls through several
324 signals. It knows how to execute certain ``GC operations'', which
325 represent common Lisp memory management actions. Examples of these are
326 \texttt{CAR} and \texttt{CDR} (``given a pointer to a cons cell,
327 return either the CAR or the CDR of that cons cell'') or \texttt{CONS}
328 (``given two pointers, allocate a new cons cell, copy them into it,
329 and return a pointer to the new cons cell''). The existence of the GC
330 isolates the rest of the processor from the particularities of the
331 memory used.
332
333 The EVAL state machine evaluates Lisp expressions. Signals coming out
334 of it control the GC by selecting what operation the GC should perform
335 and connecting the E and G buses to supply arguments to the GC and
336 read results back. The EVAL reads expressions from memory via the GC
337 and evaluates them, putting results back into memory.
338
339 The overall operation of the processor is to read an expression from
340 the magic location ``5'' in memory, evaluate it, and place the result
341 into the magic location ``6'' in memory. Other memory locations are
342 used to store subexpressions of the input expression, and to store any
343 intermediate results. Despite its name, GC does not perform any
344 garbage collection, so at the end of the processor's operation the
345 memory will still contain all intermediate results produced during
346 evaluation.
347
348 The processor is intended as a proof of concept and not a
349 production-ready hardware implementation of Lisp. Therefore it has
350 several important limitations, for example there is no primitive
351 operation that implements equality of objects (making it impossible to
352 distinguish between symbols), there are no primitive arithmetic
353 operations (so one needs to use Church numerals if numbers are
354 needed), and the processor includes a known bug (``the ATOM bug'').
355 The bug is that the ATOM primitive considers NIL to be non-atomic,
356 which is incorrect.
357
358 The Lisp expressions mentioned here are in textual form (Lisp source
359 code), but instead in a special tagged pointer format. Each memory
360 value is a union of two values: a 8-bit pointer and a 3-bit type. We
361 can use pairs of consecutive memory locations to represent cons cells
362 and build linked lists out of them. If a list is at location X, that
363 means that (X, X+1) is the first cons cell of the list, with X+1 being
364 the CAR (the value of the first element) and X being the CDR (a
365 pointer to the next cons cell). A discussion of the different ways to
366 represent Lisp expressions (source code, this tagged pointer format,
367 and an intermediate representation) can be found in \ref{sec:formats}.
368
369 The next page shows a diagram of how the processor in the paper is
370 structured. The two read-only memories can be seen, with their
371 registers, state registers, and the interconnection between the E and
372 G buses.
373
374 \begin{figure}
375 \centering\includegraphics[width=18.5cm]{AIM-514-design.pdf}
376 \caption{Design diagram of processor from \cite{lambda}}
377 \end{figure}
378
379 \chapter{A Lisp Machine processor on a USB drive}
380 The project comprises a hardware part written in Verilog and a
381 software part written in Perl.
382
383 The hardware part is a Lisp machine processor with associated
384 circuitry that can read a program from the iCEstick UART and that can
385 write the result of running a program to the UART. The normal
386 operation of this part is to:
387
388 \begin{enumerate}
389 \item Wait for a program to be supplied via UART
390 \item Run the program on the processor
391 \item Write a memory dump to the UART
392 \item Go back to step 1
393 \end{enumerate}
394
395 The software part includes:
396
397 \begin{itemize}
398 \item A compiler, which takes Lisp source code and puts it in the
399 linked list format understood by the SIMPLE processor
400 \item An assembler, which takes the result of the compiler and
401 converts it to the SIMPLE in-memory binary format.
402 \item A REPL, which repeatedly reads a program from the user, compiles
403 it, assembles it, sends it to YULE, reads the memory dump from YULE,
404 and prints the result for the user to see.
405 \end{itemize}
406
407 \section{Hardware}
408 The hardware part is built from 8 modules:
409
410 \begin{enumerate}
411 \item The \textbf{READER}, which implements step 1 of the normal operation described above.
412 \item The \textbf{WRITER}, which implements step 3 of the normal operation described above.
413 \item The \textbf{GCRAM}, a single-port RAM.
414 \item The \textbf{GC}, corresponding to the GC PLA in the original implementation.
415 \item The \textbf{EVAL}, corresponding to the EVAL PLA in the original implementation.
416 \item The \textbf{CTRL}, which advances the operation from one step to the next one.
417 \item The \textbf{UART}, which implements transmission and reception of bytes via UART.
418 \item The \textbf{CPU}, which instantiates every module above and wires them together.
419 \end{enumerate}
420
421 \subsection{Clocks}
422 To avoid timing problems, a single 48 MHz clock is used. The iCEstick
423 has a 12 MHz oscillator, and the phase-locked loop circuitry on the
424 FPGA multiplies this signal by 4.
425
426 Every flip-flop updates on the positive edge of the clock, except for
427 the RAM which updates on the negative edge. This way the rest of the
428 circuit perceives the RAM to be combinational (it ``updates
429 immediately''): if on a positive edge the RAM's address input changes,
430 the RAM will update its output on the following negative edge and the
431 changed output will be visible on the next positive edge. If the RAM
432 updated on a positive edge, then if the input changed on positive edge
433 $i$ the changed output will only be visible on positive edge $i+2$.
434
435 The design is intended to be run on a FPGA, which has a fixed clock
436 tree. This typically consists of a small number of special nets called
437 global buffers that can distribute signals with minimal skew to every
438 logic cell. The FPGA on the iCEstick has 8 such global buffers.
439
440
441 There are 4 components in the design (GC, EVAL, READER, WRITER) that
442 need to be enabled/disabled at different times. One way to achieve
443 this is to AND the master 48 MHz clock with several signals, thus
444 creating a separate clock for each component. This is known as
445 \textit{clock gating}. However, on a FPGA clock gating involves
446 routing the clock from a global buffer to a logic cell and then back
447 to another global buffer. This can introduce glitches in the clock and
448 would delay each of the clocks by some amount (dependent on the
449 signals it is ANDed with). It is therefore possible that the 4 output
450 clocks would not be aligned, causing problems when data is
451 communicated by one component to another. As such, clock gating is not
452 recommended on FPGAs.
453
454 \begin{figure}
455 \centering\includegraphics[height=3cm]{gating.eps}
456 \caption{Gated clock on the left, clock enable on the right}
457 \end{figure}
458
459 To avoid this problem, the design used here does not use clock gating
460 and instead uses 4 clock enable signals supplied by the controller to
461 each of the GC, EVAL, READER and WRITER. At any point the clock is
462 enabled for exactly one of the READER, GC, and WRITER. The EVAL clock
463 is enabled if and only if the GC clock is enabled and the GC's
464 \texttt{step\_eval} output is high.
465
466 \subsection{GC}
467 \begin{figure}
468 \centering\includegraphics[height=4.88cm]{gc.eps}
469 \caption{GC module}
470 \end{figure}
471
472 The GC module is connected to the GCRAM (via the \texttt{ram\_do,
473 ram\_we, ram\_addr, ram\_di} signals), to the EVAL (via the
474 \texttt{Ein, Eout, gcop, step\_eval, conn\_et, conn\_ea} signals) and
475 to the controller. It also provides the value of its P register (the
476 free storage pointer) to the WRITER, so the writer can know when to
477 stop dumping the memory.
478
479 It is implemented as a combinational ROM, some combinational logic to
480 drive the internal G bus, sequential logic to advance to the next
481 state respecting dispatch rules, and sequential logic to update
482 registers. The clock enable signal acts as an active low synchronous
483 reset: when the clock is disabled, the state register is set to the
484 initial state.
485
486 This module represents one half of the original design presented in
487 \cite{lambda}, and the ROM's contents are those from the paper with a
488 bug (the original design sets the ADR line low in two states when it
489 should be high).
490
491 \subsection{EVAL}
492 \begin{figure}
493 \centering\includegraphics[height=3.29cm]{eval.eps}
494 \caption{EVAL module}
495 \end{figure}
496
497 The EVAL module is connected to the GC (via the \texttt{conn\_et,
498 conn\_ea, Ein, Eout, gcop} signals) and to the controller.
499
500 The implementation of the EVAL is very similar to that of the GC.
501 There is a combinational ROM, combinational logic to drive the E bus,
502 sequential logic to advance to the next state respecting dispatch
503 rules, and sequential logic to update registers. A synchronous reset
504 is available. When the reset line is held high for one clock cycle,
505 the state register is set to the initial state. It is necessary to
506 separate the reset from the clock enable because the GC often pauses
507 the EVAL to do work and the EVAL should not be reset every time this
508 happens.
509
510 This module represents the other half of the original design presented
511 in \cite{lambda}, and the ROM's contents are the same as those in the
512 paper except that the ROM is widened by one bit to indicate whether
513 the EVAL is finished evaluating the expression.
514
515 \subsection{Writer}
516 \begin{figure}
517 \centering\includegraphics[height=2.26cm]{writer.eps}
518 \caption{WRITER module}
519 \end{figure}
520
521 The WRITER module is connected to the UART (via the \texttt{tx\_busy,
522 tx\_signal, tx\_byte} signals), the GC (via the \texttt{P} signal),
523 the RAM (via the \texttt{ram\_addr, ram\_do} signals), and to the
524 controller.
525
526 Its purpose is to dump the memory to the UART starting with position 4
527 (since positions 0, 1, 2, 3 are constant) and finishing with position
528 $P - 1$, the last used cell in memory. It is implemented as a state
529 machine with 7 states and an extra register (\texttt{current\_index})
530 that stores the current index in the memory.
531
532 The memory is 16-bit wide, but the UART sends and receives bytes. The
533 writer dumps memory word in network byte order (big-endian), meaning
534 the significant byte is sent before the least significant byte.
535
536 The states are, in order:
537 \begin{description}
538 \item[START] which initializes \texttt{current\_index} to 4
539 \item[WRITE1\_WAIT] which waits for \texttt{tx\_busy} to become low
540 \item[WRITE1] in which transmission of the upper byte starts
541 \item[WRITE2\_WAIT] which waits for \texttt{tx\_busy} to become low
542 \item[WRITE2] in which transmission of the lower byte starts
543 \item[INCREMENT] in which \texttt{current\_index} is incremented,
544 jumping back to WRITE1\_WAIT if it is still lower than
545 \texttt{freeptr}.
546 \item[FINISHED] in which \texttt{finished} is driven high
547 \end{description}
548
549 This module is not part of the original design in \cite{lambda}, as
550 that paper did not concern itself with input and output.
551
552 \subsection{Reader}
553 \begin{figure}
554 \centering\includegraphics[height=2.22cm]{reader.eps}
555 \caption{READER module}
556 \end{figure}
557
558 The READER module is connected to the UART (via the
559 \texttt{rx\_signal, rx\_byte} signals), the RAM (via the
560 \texttt{ram\_we, ram\_addr, ram\_di} signals), and to the controller.
561
562 Its purpose is to read a program from the UART and overwrite the GCRAM
563 with this program. It is implemented as a state machine with two extra
564 registers (\texttt{total\_words} and \texttt{current\_index}) to store
565 the size of the incoming program and the current position within it.
566
567 The reader expects to receive a 13-bit \texttt{length} value, followed
568 by \texttt{length} 16-bit values that represent the desired memory
569 contents. Each of these values is sent in network byte order
570 (big-endian), meaning the most significant byte is sent before the
571 least significant byte.
572
573 The states are, in order:
574 \begin{description}
575 \item[IDLE] which waits for \texttt{rx\_signal}, puts the read byte at the top of \texttt{total\_words}.
576 \item[LENGTH] which waits for \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{total\_words}
577 \item[READ1] which waits for the \texttt{rx\_signal}, puts the read byte at the top of \texttt{ram\_di} and increments \texttt{current\_index}
578 \item[READ2] which waits for the \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{ram\_di}
579 \item[WRITE] in which \texttt{ram\_we} is high, jumping back to READ1 if \texttt{current\_index + 1 < total\_words}
580 \item[FINISHED] in which \texttt{finished} is high
581 \end{description}
582
583 This module is not part of the original design in \cite{lambda}, as
584 that paper did not concern itself with input and output.
585
586 \subsection{Putting the hardware together}
587 The components described above are managed by the controller, which
588 provides them with clock enable signals. The controller is a state
589 machine with three states:
590 \begin{description}
591 \item[STATE\_READ] in which only the READER clock is enabled
592 \item[STATE\_RUN] in which the GC and EVAL clocks are enabled
593 \item[STATE\_WRITE] in which only the WRITER clock is enabled
594 \end{description}
595
596 In STATE\_READ and STATE\_WRITE, the EVAL's reset line is high.
597
598 Whenever the component corresponding to the current state indicates it
599 is finished, the controller advances to the next state.
600
601 The controller also includes three multiplexers that control which
602 lines are connected to the GCRAM. If the READER is active, all three
603 GCRAM lines are connected to the READER. If the WRITER is active, the
604 address line is connected to the writer (and the other lines are
605 connected to the GC). If the GC is active, all lines are connected to
606 the GC.
607
608 The controller also includes logic to drive the 5 LEDs on the
609 iCEstick. Three LEDs indicate the current state, while the other two
610 indicate if the UART is currently transmitting or receiving.
611
612 \section{Representing a Lisp program}
613 \label{sec:formats}
614 The processor described above evaluates a Lisp program received via
615 UART. This section talks about three representations of such a
616 program: the Lisp source code (a S-expression), the intermediate form
617 (also a S-expression), and the in-memory form (a binary string).
618
619 The first representation is obtained by parsing the user input, the
620 compiler converts this into the second, and the assembler converts the
621 second into the third.
622
623 \subsection{Lisp source code}
624 Recall that a S-expression is either an atom or a cons cell.
625
626 An atom is a sequence of letters, digits, and certain symbols. There
627 are two special atoms named \texttt{T} and \texttt{NIL} representing
628 the boolean true and false respectively.
629
630 A cons cell is a pair of two S-expressions called the ``car'' and the
631 ``cdr''. We write them as \texttt{(car . cdr)}, for example \texttt{(5
632 . 7)} is a cons cell with car \texttt{5} and cdr \texttt{7}.
633
634 We can use cons cells to represent linked lists. A linked list is a
635 sequence of cons cells where the cdr of one cell is the next cell, and
636 the cdr of the last cell is \texttt{NIL}. For example the list
637 \texttt{1,2,3} can be written as \texttt{(1 . (2 . (3 . NIL)))}.
638 We can also write this list more concisely as \texttt{(1 2 3)}, and we
639 write \texttt{()} to mean the atom \texttt{NIL}.
640
641 Using these syntax elements we can encode several types of logical
642 expressions:
643
644 \begin{itemize}
645 \item Numbers are encoded as numeric atoms, e.g. \texttt{5}
646 \item Variables are encoded as non-numeric atoms, e.g. \texttt{my-var}
647 \item Conditional statements are encoded as 4-element lists, e.g.
648 \texttt{(if is-odd 5 6)} which has value 5 if the variable
649 \texttt{is-odd} is not \texttt{NIL} and value 6 otherwise
650 \item Symbols are encoded as non-numeric atoms preceded by a single
651 quote, e.g. \texttt{'my-symbol}. This can also be written
652 \texttt{(quote my-symbol)}
653 \item Constant lists are encoded as lists preceded by a single quote,
654 e.g. \texttt{'(1 2 (3 4 5))}. This can also be written
655 \texttt{(quote (1 2 (3 4 5)))}
656 \item Functions (lambda abstractions) are encoded as 4-element lists,
657 e.g. \texttt{(lambda first (a b) a)} in which the second element is
658 the function name, the third element is a list of function
659 arguments, and the fourth element is the function body.
660 \item Function calls are encoded as lists, with the function as the
661 first element and the arguments as the other elements, e.g.
662 \texttt{(atom 5)} calls the function \texttt{atom} with argument
663 \texttt{5}.
664 \end{itemize}
665
666 \subsection{In-memory representation}
667 The memory is an array of 16-bit integer values. We interpret these as
668 tagged pointers: the top 3 bits indicate the ``type'' of this pointer,
669 while the other 13 bits represent another memory location. A pointer
670 therefore has a type and a value.
671
672 We represent each Lisp expression as a tagged pointer. To evaluate it,
673 the processor looks at the type of the pointer. Here is what a pointer
674 of value $v$ means based on its type. We use the syntax $mem[a]$ to
675 mean ``the contents of the memory location $a$'' (that is, a tagged
676 pointer) and $memval[a]$ to mean ``the value at the memory location
677 $a$'' (that is, the value of the tagged pointer $mem[a]$).
678
679 \begin{enumerate}
680 \setcounter{enumi}{-1}
681 \item is a constant list, whose cdr is $mem[v]$ and car is $mem[v+1]$.
682 \item is a constant symbol.
683 \item is a variable reference, where $v$ indicates the position in the
684 environment of the referenced variable.
685 \item is a constant closure, that is the result of evaluating a lambda
686 abstraction. Values of this type do not normally appear in the input
687 code, but instead are created during evaluation of other forms.
688 \item is a procedure (lambda abstraction), whose body is $mem[v]$.
689 \item is a conditional, where $mem[v+1]$ is the condition,
690 $mem[memval[v]+1]$ is the ``then'' branch, and $mem[memval[v]]$ is the
691 ``else'' branch.
692 \item is a procedure call, which points to a special kind of linked
693 list. The CDR of each cell (other than the last one) points to the
694 next cell and has type 0, while the CDR of the last cell has value 0
695 and its type indicates the primitive to be called. The CARs of these
696 cells indicate the arguments for the procedure, in reverse order.
697
698 To call a user-defined procedure, we use the primitive
699 \texttt{funcall} and put the user-defined procedure as the CAR of
700 the last cell (in other words, the user-defined procedure is the
701 first argument to \texttt{funcall}).
702 \item is a quoted constant, representing the object at $mem[v]$.
703 \end{enumerate}
704
705 The booleans \texttt{NIL} and \texttt{T} are compiled to a list
706 pointing to 0, and a symbol with value 2 respectively.
707
708 The memory has a peculiar fixed layout. Locations 0 and 1 are the CDR
709 and CAR of NIL, locations 2 and 3 are the CDR and CAR of T, location 4
710 holds the first unused location in memory (this tells the processor
711 where it can start to place intermediate expressions), location 5
712 holds the expression to be evaluated, and location 6 is where the
713 processor will place the value of the expression.
714
715 \subsection{The intermediate form}
716 To facilitate conversion from the former format to the latter, an
717 intermediate form is used. Here we represent tagged pointers as lists
718 of one of three types:
719
720 \begin{itemize}
721 \item \texttt{(type value)}, where ``type'' is the pointer's type and
722 ``value'' is the pointer's (numeric) value.
723 \item \texttt{(type ptr)}, where the pointer's type is ``type'' and
724 the pointer's value is $x$, where $x$ is the memory location where
725 ``ptr'' (another tagged pointer) lies.
726 \item \texttt{(type ptr1 ptr2)}, where the pointer's type is ``type''
727 and the pointer's value is $x$, where $x$ is the memory location
728 where ``ptr1'' lies and $x+1$ is the memory location where ``ptr2''
729 lies.
730 \end{itemize}
731
732 These three representations map nicely to the tagged pointers
733 described in the previous section. The first representation is used
734 for constant symbols and variable references, the second for
735 procedures, and the third for constant lists, conditionals, and
736 procedure calls.
737
738 A visual representation of the intermediate representation (from the
739 AIM-514 paper) can be seen on the next page. The expression in
740 question is
741
742 \begin{lstlisting}
743 ((lambda append (x y) (if (atom x) y (cons (car x) (append (cdr x) y)))) '(a b c) '(d e f))
744 \end{lstlisting}
745
746 which we would represent as this sexp:
747
748 \begin{lstlisting}
749 (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3))) (VAR -2)) (CALL (ATOM 0) (VAR -3))))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
750 \end{lstlisting}
751
752 As mentioned in the Background chapter, the processor has a bug (the
753 ATOM bug) that causes \texttt{(ATOM NIL)} to be true. As a result,
754 \texttt{(atom x)} in this expression is always true, even when
755 \texttt{x} is the empty list, and the expression loops forever (or at
756 least until memory is exhausted). We can fix this by changing the
757 condition from \texttt{(atom x)} to \texttt{x}, which is false when
758 \texttt{x} is NIL and true otherwise, and swapping the two branches of
759 the conditional. The fixed program is:
760
761 \begin{lstlisting}
762 ((lambda append (x y) (if x (cons (car x) (append (cdr x) y)) y)) '(a b c) '(d e f))
763 \end{lstlisting}
764
765 which is compiled to the slightly shorter sexp:
766
767 \begin{lstlisting}
768 (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (VAR -2) (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3)))) (VAR -3)))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
769 \end{lstlisting}
770
771
772 \begin{figure}
773 \centering\includegraphics[width=18.5cm]{AIM-514-ir.pdf}
774 \caption{Example of intermediate representation}
775 \end{figure}
776
777 \section{Software}
778 \subsection{Compiler}
779 The compiler takes in Lisp code and translates it to a sexp-based
780 format understood by the assembler.
781
782 The compiler has several functions for compiling certain kinds of
783 expressions, and a function (\texttt{process\_toplevel}) that uses the
784 former to compile arbitrary expressions.
785
786 The output from any of this functions is a sexp in the intermediate
787 form described in the previous section.
788
789 Symbols are identified by sequences of letters on the source code side
790 and by numbers on the processor side. The compiler therefore keeps a
791 mapping between symbol names and their numbers. Whenever a new symbol
792 is encountered, a new entry is added to this map.
793
794 The \texttt{process\_quoted} function handles constants. For example
795 if the source code contains \texttt{'(1 2 3)} then
796 \texttt{process\_quoted} is called with \texttt{(1 2 3)}. This
797 function handles the symbol interning described in the previous
798 paragraph, and converts its input to intermediate form.
799
800 The \texttt{process\_proc} function handles lambda abstractions. For
801 example if the source code contains \texttt{(lambda first (a b) a)}
802 then \texttt{process\_proc} is called with \texttt{first},
803 \texttt{(a b)}, and \texttt{a}. This function appends the function
804 name and arguments to the environment, then processes the body (using
805 \texttt{process\_toplevel}) with the new environment.
806
807 The \texttt{process\_funcall} function handles function calls. For
808 example if the source code contains \texttt{(reverse-list 2 1)} then
809 \texttt{process\_funcall} is called with \texttt{reverse-list}, and
810 \texttt{(2 1)}. The arguments are first processed (using
811 \texttt{process\_toplevel}), then (if the function is not a primitive)
812 the function is processed as well.
813
814 Finally the \texttt{process\_toplevel} handles any expression by
815 figuring out what sort of expression it is and then processing it
816 appropriately. Empty lists and \texttt{NIL} become \texttt{(list 0)},
817 numbers and \texttt{T} are processed with \texttt{process\_quoted}
818
819 Examples of compiler operation can be found in the compiler tests.
820 Here are a few of them:
821
822 \begin{itemize}
823 \item The Lisp expression \texttt{'()} is compiled to \texttt{(LIST 0)}
824 \item The Lisp expression \texttt{'(5 foo)} is compiled to
825 \texttt{(LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4))}. Observe that
826 the CDR comes before the CAR, and symbols are interned:
827 \texttt{'foo} is compiled to \texttt{(SYMBOL 3)} and \texttt{'5} is
828 compiled to \texttt{(SYMBOL 4)}
829 \item The Lisp expression \texttt{(if t '(2 3) 'x)} is compiled to
830 \texttt{(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0) (SYMBOL 3))
831 (SYMBOL 4))) (SYMBOL 2))}. Observe that \texttt{t} is always
832 compiled to \texttt{(SYMBOL 2)}, as mentioned in the section about
833 the memory format.
834 \end{itemize}
835
836 \subsection{Assembler}
837 The assembler takes in an S-expression in the intermediate form, lays
838 out the words in memory, and produces a memory dump that can be
839 supplied to the processor.
840
841 The core of the assembler is a function called \texttt{process} that
842 takes an expression in intermediate form and an optional location in
843 memory. This function lays out the expression's subexpressions in
844 memory and then lays out the expression itself at the given location
845 (or at the end of the memory if there is no location given).
846
847 The assembler begins by initializing the first 7 elements of the
848 memory with the values needed by the processor. After initialization,
849 the \texttt{process} function is called on the input expression with
850 no location.
851
852 Here is how the \texttt{process} function lays out the expression
853 $expr$ at location $l$ (where $l$ can be a given memory location or
854 simply ``the first free location in memory''):
855
856 \begin{itemize}
857 \item If $expr$ is \texttt{(type value)}, then it is directly laid out
858 at $l$
859 \item If $expr$ is \texttt{(type ptr)}, then \texttt{process} is
860 called recursively to lay \texttt{ptr} out at the first free
861 location in memory (say this is location $loc$). Then \texttt{(type
862 $loc$)} is laid out at $l$.
863 \item If $expr$ is \texttt{(type ptr1 ptr2)}, then two consecutive
864 free locations in memory $loc$ and $loc+1$ are reserved,
865 \texttt{process} is recursively called twice to lay \texttt{ptr1} at
866 $loc$ and \texttt{ptr2} at $loc+1$, and finally
867 \texttt{(type $loc$)} is laid out at $l$.
868 \end{itemize}
869
870 Once again, examples of assembler operations can be found in the
871 assembler tests. Two of them:
872
873 \begin{itemize}
874 \item \texttt{(quoted (symbol 5))} is assembled to \texttt{00 08 00 00
875 00 00 20 00 20 00 00 08 E0 07 00 00 20 05}. Observe that each word
876 is made of two bytes, the first word (\texttt{00 08}) indicating the
877 length (in words) of the program. The first 4 words are always
878 fixed: they represent in order the CDR of NIL, the CAR of NIL, the
879 CDR of T, and the CAR of T. The 7th word is also a constant, 0,
880 because that is where the processor puts the result. The 5th word is
881 the first free word in memory (which is equal to the length in words
882 of the program), and the 6th word (\texttt{E0 07}) is the expression
883 to be evaluated. In this case, it has type quote and points to
884 memory location 7 (the 8th word, because pointers are 0-indexed). At
885 memory location 7 we have the value \texttt{20 05} which has type
886 symbol and value 5.
887 \item \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} is
888 assembled to \texttt{00 0C 00 00 00 00 20 00 20 00 00 0C C0 07 00 00
889 00 09 20 05 E0 00 80 0B 5F FE}. Here we want to evaulate the
890 expression \texttt{C0 07} (type ``function call'', address 7). This
891 points to a cons-cell, whose car (at position 8) is \texttt{20 05}
892 (that is, \texttt{(number 5)}) and whose cdr (at position 7) is
893 \texttt{00 09} (type ``more'' and address 9). The ``more'' points to
894 another cons-cell, whose cdr (at position 9) is \texttt{E0 00} (that
895 is, \texttt{(funcall 0)}), and whose car is \texttt{80 0B}: a
896 procedure whose body is at position 11, namely \texttt{5F FE} (that
897 is, \texttt{(var -2)}).
898
899 So the expression is a call to the primitive \texttt{funcall} with
900 arguments \texttt{(lambda l (x) x)} and \texttt{5}. The original
901 program was probably \texttt{((lambda l (x) x) 5)}. Note that this
902 program was compiled with an old version of the compiler that did
903 not intern symbols, which explains why \texttt{5} was compiled to
904 \texttt{(number 5)} and not \texttt{(symbol 3)}.
905 \end{itemize}
906
907 \subsection{REPL}
908 The REPL ties everything together: it reads a program from the user,
909 compiles it, assembles it, sends the assembled program to the
910 processor, reads the memory dump from the processor, and finally
911 prints the result to the screen.
912
913 The REPL includes an S-expression pretty-printer that operates on
914 memory dumps. As symbols are interned by the compiler, when the
915 pretty-printer hits a symbol it only knows its index, and not the
916 original name. To print the original name the pretty-printer searches
917 the symbol map built by the compiler and pulls the name from there.
918
919 The pretty-printer takes a chunk of memory and index, and prints the
920 expression at that index. For example, if the memory chunk is
921 \texttt{00 00 00 00 20 00 20 00 00 09 00 07 00 07 20 02 00 00 60 00}
922 and the index is 6, then the pretty printer will try to print
923 \texttt{00 07}, which is a list with car \texttt{00 00} and cdr
924 \texttt{20 02}, and the pretty printer prints \texttt{(nil . t)}.
925
926 An example REPL session is:
927
928 \begin{lstlisting}
929 YULE REPL
930
931 * (cons 1 2)
932 (1 . 2)
933 * (car '(4 5 6))
934 4
935 * (cdr '(t t t))
936 (T T)
937 * ((lambda id (x) x) 5)
938 5
939 * (lambda first (a b) a)
940 <CLOSURE>
941 * ((lambda first (a b) a) 7 8)
942 7
943 * (atom 6)
944 T
945 \end{lstlisting}
946
947 \chapter{Evaluation and Testing}
948 To make development easier, ensure everything works correctly, and
949 prevent regressions there are unit tests for the compiler and
950 assembler, and functional tests for the entire project.
951
952 Besides these automated tests, manual evaluation (in the beginning,
953 precompiling simple expressions and supplying them to the processor as
954 the initial value of the memory; later on, manually typing expressions
955 in the REPL and looking at the output). When the processor was not
956 operating as expected, the GTKWave wave simulator was used to assist
957 with debugging. This piece of software takes in a recorded simulation
958 of the circuits and displays the values of all signals inside the
959 circuit against time. This allows the designer to look at every clock
960 cycle in turn and discover where the error occurs.
961
962 The process of debugging the processor had the side effect of finding
963 inefficiencies in the original design. While \cite{lambda} mentions
964 some limitations of the processors, one important limitation that was
965 only discovered during debugging is that for every operation that the
966 EVAL performs, the result of that operation will be written to new
967 locations in memory. This includes ``read-only'' operations such as
968 CAR and CDR. If the programmer requests the CAR of a list, a copy of
969 the first element will be taken and a pointer to it will be returned.
970 Since YULE only has 4096 words of memory, this severely restricts the
971 complexity of programs that can be evaluated on it.
972
973 Another avenue to ensure the processor works as it should is formal
974 verification using model checking tools. Formal verification is
975 commonly employed as part of circuit design evaluation because
976 mistakes are more difficult to identify, and if found after the chip
977 has been built they are very costly to fix (often the chips that were
978 built with the bug need to be thrown away). However, building models
979 is a very involved process, and devoting effort to formal verification
980 of this project would not have allowed the implementation of several
981 features. Due to time constraints and knowing that no physical YULE
982 chips are built (the project is so far only synthesised to a FPGA),
983 testing and manual evaluation was considered sufficient for YULE.
984
985 \section{Unit testing}
986 The software was tested with 48 unit tests written in Perl using the
987 \texttt{Test::More} framework.
988
989 The assembler takes in compiled Lisp code and outputs a memory dump in
990 one of two formats (binary or Verilog code). Its tests take a
991 configuration for the assembler, an input, two expected outputs, and
992 (for documentation purposes) the source Lisp code. An example test:
993
994 \begin{lstlisting}
995 $expbin =
996 '00 0C 00 00 00 00 20 00 20 00 00 0C C0 07'
997 . ' 00 00 00 09 20 05 E0 00 80 0B 5F FE';
998 run_test {addr_bits => 13},
999 '(call (more (funcall 0) (proc (var -2))) (number 5))',
1000 <<'EOF', $expbin, '((LAMBDA ID (X) X) 5)';
1001 mem[ 0] <= 0; // (cdr part of NIL)
1002 mem[ 1] <= 0; // (car part of NIL)
1003 mem[ 2] <= 16'b0010000000000000; // (cdr part of T)
1004 mem[ 3] <= 16'b0010000000000000; // (car part of T)
1005 mem[ 4] <= 16'd12; // (free storage pointer)
1006 mem[ 5] <= 16'b1100000000000111; // CALL 7
1007 mem[ 6] <= 0; // (result of computation)
1008 mem[ 7] <= 16'b0000000000001001; // MORE 9
1009 mem[ 8] <= 16'b0010000000000101; // NUMBER 5
1010 mem[ 9] <= 16'b1110000000000000; // FUNCALL 0
1011 mem[10] <= 16'b1000000000001011; // PROC 11
1012 mem[11] <= 16'b0101111111111110; // VAR -2
1013 EOF
1014 \end{lstlisting}
1015
1016 Here the compiled code \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))}
1017 is given to the assembler, and its outputs are compared to the binary
1018 string on the first line and the block of Verilog code.
1019
1020 The tests do not all contain valid assembler inputs. There are a
1021 series of tests that give invalid inputs to the assembler and expect
1022 the assembler to raise an exception matching a pattern. Example tests:
1023
1024 \begin{lstlisting}
1025 expect_error_like { run_test {}, '((type is a list) 5)'}
1026 'Type of toplevel is not atom';
1027 expect_error_like { run_test {}, '(badtype 5)'}
1028 'No such type';
1029 expect_error_like { run_test {}, '(5 700000)'}
1030 'Addr too large';
1031 \end{lstlisting}
1032
1033 On the compiler side, there are 4 tests for the quoting function,
1034 but the brunt of the tests are pairs of a compiler input (i.e. Lisp
1035 code) and a compiler output (i.e. a sexp that the assembler
1036 understands). For each pair the compiler is run on the first element
1037 and its output is compared with the second. Example such tests:
1038
1039 \begin{lstlisting}
1040 is_toplevel '(quote 5)', '(SYMBOL 3)';
1041 is_toplevel '(reverse-list \'a \'a \'b)',
1042 '(CALL (MORE (MORE (REVERSE-LIST 0) (SYMBOL 4)) (SYMBOL 3)) (SYMBOL 3))';
1043 is_toplevel '(if t \'(2 3) \'x)',
1044 '(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0)'
1045 . ' (SYMBOL 3)) (SYMBOL 4))) (SYMBOL 2))';
1046 \end{lstlisting}
1047
1048 As the compiler code includes a function to pretty-print
1049 S-expressions, there are a few round-trip tests on this function.
1050 Given a string, it is parsed using the \texttt{Data::SExpression}
1051 parsing library then it is pretty-printed. The result should match the
1052 original string.
1053
1054 There are also tests for handling of invalid input, like in the assembler. These include:
1055
1056 \begin{lstlisting}
1057 expect_error_like { is_toplevel 'x' } 'Variable x not in environment';
1058 expect_error_like { is_toplevel '(car)' } 'Cannot call primitive car with no arguments';
1059 \end{lstlisting}
1060
1061 A test coverage tool, \texttt{Devel::Cover}, was ran on the compiler
1062 and assembler to find statements and branches that were not covered by
1063 existing unit tests, and new tests were written to cover them. Right
1064 now \texttt{Devel::Cover} reports 100\% test coverage of the compiler
1065 and assembler.
1066
1067 \begin{figure}
1068 \centering\includegraphics[height=8cm]{devel-cover.png}
1069 \caption{Screenshot of \texttt{Devel::Cover} results, showing 100\%
1070 test coverage}
1071 \end{figure}
1072
1073 \section{Functional testing}
1074 Besides unit tests, the entire system (hardware and software) has
1075 undergone functional testing. This means that a set of 21 Lisp
1076 expressions has been prepared, alongside their correct values computed
1077 using a software Lisp implementation. These values are given to the
1078 YULE REPL and the test succeeds if the output equals the known correct
1079 value of the expression.
1080
1081 Here are some example tests:
1082 \begin{lstlisting}
1083 '(1 2 3)
1084 5
1085 ((lambda id (x) x) 5)
1086 ((lambda id (x) x) '(1 2 3))
1087 (car '(2 3))
1088 (car (cdr '(4 5 6)))
1089 ((lambda snd (x y) y) 'first 'second)
1090
1091 (cons '(1 2) '(7))
1092 (car (cons '(1 2) '(7)))
1093 (cdr (cons '(1 2) '(7)))
1094
1095 ((lambda rev (xs) ((lambda reverse-acc (xs acc) (if xs (reverse-acc (cdr xs) (cons (car xs) acc)) acc)) xs '())) '(4 5 6 7))
1096 \end{lstlisting}
1097
1098 This test suite tries to cover most kinds of valid expressions. It
1099 includes constants of all kinds, conditionals, lambda abstractions,
1100 calls to all primitives, calls to user-defined functions, recursive
1101 calls.
1102
1103 \section{Timing and performance}
1104 There are three parts to the performance of YULE:
1105
1106 \begin{enumerate}
1107 \item The efficiency of the software. This refers to the time taken to
1108 compile and assemble input programs, which is negligible.
1109
1110 \item The I/O performance, which refers to the time needed to transfer
1111 the program from the computer to the processor's memory, plus the
1112 time needed to transfer the memory dump from the memory back to the
1113 computer.
1114
1115 \item The speed of the processor itself, that is the time/number of
1116 cycles that the processor needs to evaluate a given expression.
1117 \end{enumerate}
1118
1119 The GC runs at 48MHz, while the EVAL only runs when the GC allows it
1120 to which is at most half of the time.
1121
1122 The UART runs at 4000000 baud, so a bit is sent every 12 cycles of the
1123 48MHz clock. A byte is made of 11 bits: a start bit, 8 data bits, and
1124 one stop bit. A word is two bytes, so 22 bits. Therefore 264 cycles
1125 are needed to transmit one word.
1126
1127 Since the WRITER just sends a full memory dump, the I/O performance is
1128 proportional not to \texttt{program\_len + result\_len} but to
1129 \texttt{program\_len + result\_len + intermediate\_results\_len}. As
1130 evaluation of any expression involves the creation of at least one
1131 value (the result), for every expression or subexpression evaluated at
1132 least 264 cycles will be needed to output that result. The EVAL takes
1133 significantly less than 264 cycles to evaluate a expression (not
1134 counting its subexpressions), in one test each expression took between
1135 30 and 90 cycles to evaluate. The processor performance is therefore
1136 dominated by I/O.
1137
1138 As an example, the program \texttt{((lambda id (x) x) 5)} is 12 bytes
1139 long, and takes 258 cycles to evaluate. The memory dump is 54 bytes
1140 long. Therefore of the roughly 17700 cycles needed from beginning to
1141 end, about 98.5\% are used to do I/O.
1142
1143 \section{A worked example}
1144 \label{sec:example}
1145 Here is an example of what happens when a expression is typed into the
1146 REPL:
1147
1148 \begin{enumerate}
1149 \item A user types into the REPL the string
1150
1151 \begin{lstlisting}
1152 (car '(2 3 4))
1153 \end{lstlisting}
1154
1155 This is a call to the CAR primitive, with the list \texttt{(2 3 4)}
1156 as an argument. We expect the result to be \texttt{2}, the first
1157 element of the list.
1158
1159 \item This string is given to the compiler, producing the sexp
1160
1161 \begin{lstlisting}
1162 (CALL (CAR 0) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5)))
1163 \end{lstlisting}
1164
1165 Observe that in this representation the CDR comes before the CAR. So
1166 for example ``(LIST (LIST 0) (SYMBOL 3))'' represents a list whose
1167 CAR is ``(SYMBOL 3)'' and CDR is ``(LIST 0)'' (that is, nil). This
1168 is a list with one element, ``(SYMBOL 3)''. The sexp shown here
1169 represents a call to the CAR primitive with one argument. The list
1170 has three elements, which are in order \texttt{(SYMBOL 5), (SYMBOL
1171 4), (SYMBOL 3)}.
1172
1173 In this stage the compiler maps each number to a symbol, in this
1174 case \texttt{(SYMBOL 5)} is 2, \texttt{(SYMBOL 4)} is 3, and
1175 \texttt{(SYMBOL 3)} is 2. This mapping is remembered so it can be
1176 used later by the pretty printer.
1177
1178 \item This sexp is given to the assembler, producing the 16 words long
1179 binary string (here displayed in hex, with a space after each 16-bit
1180 word)
1181
1182 \begin{lstlisting}
1183 000F 0000 0000 2000 2000 000F C007 0000 2000 0009 000B 2005 000D 2004 0000 2003
1184 \end{lstlisting}
1185
1186 The first word, \texttt{000F}, indicates the length of the program
1187 in
1188 words (15 words, this does not include this first word).\\
1189 The next four words, \texttt{0000 0000 2000 2000}, represent the CDR
1190 and CAR of NIL and T respectively. These words are always
1191 constant.\\
1192 The next word, \texttt{000F}, represents the first unused position
1193 in memory. This is always equal to the first word transmitted.\\
1194 The next word, \texttt{C007}, represents the expression to be
1195 evaluated.\\
1196 The next word, \texttt{0000}, is ignored as this is where the
1197 processor will place the result of the evaluation.\\
1198 The rest of the words are subexpressions or parts of subexpressions
1199 of the expression being evaluated.\\
1200
1201 The expression that is evaluated is \texttt{C007}, which is a CALL
1202 whose CDR is at position 7 (\texttt{2000}, meaning \texttt{(CAR 0)})
1203 and whose CAR is at
1204 position 8 (\texttt{0009}).\\
1205 \texttt{0009} is a list with CAR at 10 (\texttt{2005}, meaning
1206 \texttt{(SYMBOL 5)}) and CDR at 9 (\texttt{000B}).\\
1207 \texttt{000B} is a list with CAR at 12 (\texttt{2004}, meaning
1208 \texttt{(SYMBOL 4)}) and CDR at 11 (\texttt{000D}).\\
1209 \texttt{000D} is a list with CAR at 14 (\texttt{2003}, meaning
1210 \texttt{(SYMBOL 3)}) and CDR at 13 (\texttt{0000}, meaning NIL).
1211
1212 \item This binary string is sent via UART to the FPGA's READER. The
1213 READER writes the string (except the first word) into memory,
1214 overwriting the previous contents of the memory. It then reports to
1215 the controller that it is done, and the controller advances to
1216 \texttt{STATE\_RUN}.
1217
1218 \item The EVAL and GC evaluate the program in memory, and then place
1219 the result back into memory. The EVAL reports to the controller that
1220 it is done, and the controller advances to \texttt{STATE\_WRITE}.
1221
1222 \item The WRITER sends this memory dump via UART back to the REPL
1223
1224 \begin{lstlisting}
1225 000F C007 2005 2000 0009 000B 2005 000D 2004 0000 2003 0000 4004 2000 0010 0000 0012 0000 0000 0009
1226 \end{lstlisting}
1227
1228 Then the WRITER reports to the controller that it is done, and the
1229 controller advances to \texttt{STATE\_READ}.
1230
1231 Observe that this memory dump is similar to the assembler output.
1232 The first two words are the 6th and 7th words in the assembler
1233 output, the next word is the result of the evaluation, then the next
1234 8 words are words 9th through 16th in the assembler output. The
1235 following words are intermediate results computed by the processor.
1236
1237 \item The REPL's pretty printer interprets this string starting with
1238 \texttt{2005}, which means \texttt{(SYMBOL 5)}. The pretty printer
1239 looks in the compiler's mapping and finds that the 5th symbol was
1240 2, and so the REPL prints
1241
1242 \begin{lstlisting}
1243 2
1244 *
1245 \end{lstlisting}
1246
1247 The evaluation is complete, and a new prompt is printed so that the
1248 user can input another expression.
1249 \end{enumerate}
1250
1251 \begin{figure}
1252 \centering\includegraphics[height=7cm]{repl.png}
1253 \caption{What the user types and sees in this example}
1254 \end{figure}
1255
1256 \chapter{Conclusions}
1257 A Lisp machine processor has been built using the original design and
1258 techniques presented in \cite{lambda}. It has been extended with input
1259 and output capabilities, and using a modern FPGA it has been
1260 synthesised on a physical device slightly larger than a USB flash
1261 drive. This device can be used as a standalone Lisp evaluator, or the
1262 processor inside it can be included in a more complicated circuit.
1263
1264 Software has been written to help interact with the processor. This
1265 includes a compiler, an assembler, and a REPL that together make it
1266 possible for an end-user unfamiliar with hardware design or the
1267 processor's internals to submit Lisp expressions to the processor and
1268 get back their values. The software also allows programmers to convert
1269 Lisp source code to the binary format understood by the processor, so
1270 that they may later send the binaries to the processor for evaluation.
1271
1272 The processor and software have been tested to ensure they work
1273 correctly. This has uncovered bugs and inefficiencies in the original
1274 paper, some of which could not be gleaned from simply reading the
1275 paper.
1276
1277 The source files for the hardware and software, and also the automated
1278 tests have been released under an open-source license to make it easy
1279 for future circuit designers and programmers to use and extend this
1280 project. They can be found at \url{https://git.ieval.ro/?p=yule.git}.
1281
1282 The original paper \cite{lambda} is one of the influential Lambda
1283 Papers that defined the Scheme programming language and is an
1284 important historical artifact for understanding how Lisp machine
1285 processors can be implemented and more generally how tagged
1286 architectures work. But all details of the processor cannot be
1287 understood from simply reading the paper, and most readers would not
1288 follow by attempting to implement what is described there. This
1289 projects brings a tangible implementation into the open, which shows
1290 the limitations of the design to people interested in this design and
1291 makes it possible to solve this limitations and extend the processor
1292 beyond its original scope.
1293
1294 \section{Reflections}
1295 Now that the project is finished, it is worth revisiting ideas I had
1296 in the beginning that were not ultimately useful.
1297
1298 The original concept had the processor run on a typical development
1299 board, with the software running on a Raspberry PI. The board and the
1300 Raspberry PI would be connected through jumper wires, and the UART
1301 circuitry on the Raspberry PI would be used by the software to talk to
1302 the processor.
1303
1304 This idea would have produced a much harder to use YULE. Not only
1305 would the user have to ssh into a Raspberry PI to do anything useful,
1306 but also the equipment would take more space (both boards are larger
1307 than the iCEstick). Once I tried the iCEstick and found that
1308 everything can be done just as well with it, it replaced the two
1309 boards and made the project easier to use (albeit slightly harder to
1310 debug, as the iCEstick only has 5 LEDs).
1311
1312 Some code had to be rewritten, sometimes multiple times. The compiler
1313 was originally written in Common Lisp, as it is the best suited
1314 language for manipulation of S-expressions, especially Lisp code. This
1315 meant a working project was obtained more quickly, but then the
1316 compiler had to be rewritten in another language; if the user has a
1317 software Lisp implementation already, why would they need YULE? I
1318 chose Perl as the reimplementation language, as the assembler was
1319 already written in Perl and it is widely-available, most systems
1320 coming with a Perl interpreter out of the box.
1321
1322 Throughout development of the hardware part, timing problems were a
1323 mainstay. The first ones related to the interactions between the
1324 processor and the memory (the memory was clocked, but the processor
1325 expected a combinational RAM). What fixed this problem was to run the
1326 memory at twice the speed of the rest of the processor. At some point,
1327 a decision was taken to run the GC and EVAL on different clocks (both
1328 ran at one quarter the speed of the master clock, but one of the
1329 clocks was delayed by half a cycle. The positive edge on one clock
1330 would happen midway between a positive edge and a negative edge on the
1331 other one). This proved to be unwise, as sometimes values sent by one
1332 module to the other would be cleared before the destination module
1333 managed to read them.
1334
1335 GTKWave proved to be very useful in debugging these timing issues, as
1336 the exact point where the issue began was easy to identify and I could
1337 see why the issue was happening. A workaround for this was to add a
1338 latch on each side of the common E bus that remembered the value sent
1339 long enough for the receiving module to see it.
1340
1341 All timing problems were fixed more cleanly in the end when an attempt
1342 to simplify the clocks was made. First GC and EVAL were changed to use
1343 the same clock, which removed the need for the latches, and then the
1344 memory was changed to use the negative edge of the clock instead of
1345 the positive edge, which meant no more clock division code had to be
1346 used.
1347
1348 \section{Future work}
1349 There are several ways to extend this project.
1350
1351 One idea is to fix the noted limitations: the ATOM primitive considers
1352 NIL to not be an atom, and there is no EQ primitive that compares
1353 symbols/other objects. The latter problem means that all symbols are
1354 indistinguishable (note: NIL can still be distinguished from T and
1355 other symbols, but NIL is not a symbol). To introduce equality a new
1356 signal needs to be added to the EVAL that compares (for example) the
1357 value on the E bus with the value of a fixed register, possibly a
1358 newly added one. Then (EQ A B) would store A in the register, load B
1359 on the E bus and compare them.
1360
1361 Once equality is implemented, it becomes possible to implement the
1362 EVAL function that evaluates an arbitrary Lisp expression.
1363
1364 Two more involved instruction set improvements are adding arithmetic
1365 operations, and input/output. Without arithmetic primitives one has to
1366 rely on Church numerals, which encodes numbers as lambda abstractions
1367 and is very inefficient. Input/output primitives would allow programs
1368 to receive data at runtime over the UART and send data back before
1369 evaluation is finished. This enables long-running useful programs, for
1370 example with input/output and equality a REPL can be written (which
1371 relies on the EVAl function mentioned above).
1372
1373 On the software side, the assembler can support optimizations and then
1374 the REPL can be extended with a core image. The main optimization
1375 needed in the assembler is common subexpression elimination. The
1376 processor never overwrites a value in memory with another value (other
1377 than overwriting \texttt{mem[6]} with the result of the evaluation),
1378 so it is safe to share any identical subexpressions. For example, if
1379 the user inputs the program \texttt{(cons (car '(1 2 3 4)) (cdr '(0 2
1380 3 4)))}, then when the assembler should share the tails of the two
1381 lists, meaning the second list should compile to \texttt{(LIST (LIST
1382 X) (SYMBOL Y))} where X is a pointer to where the second element of
1383 the first list is stored. This makes long programs significantly
1384 shorter. Note that no compiler changes are required to implement this,
1385 as the assembler can just remember all expressions and subexpressions
1386 it has assembler so far including their locations, and whenever a new
1387 expression needs to be assembler it can be replaced with a pointer to
1388 where the same expression has been placed before.
1389
1390 Once this optimization is implemented, the REPL can gain core image
1391 support. Suppose the user enters the expression \texttt{expr}. Instead
1392 of sending \texttt{assemble(compile(expr))} to the processor, the REPL
1393 can remember the last expression sent (or the last memory dump
1394 received), then append \texttt{assemble(compile(expr))} to that
1395 previous expression. This way \texttt{expr} can refer to previously
1396 defined expressions, as long as the REPL or compiler has a way to name
1397 previously supplied instructions, for example via a \texttt{define}
1398 special form, via the standard REPL variables (\texttt{+, ++, +++}),
1399 or by sequentially numbering all expressions submitted by the user.
1400
1401 Returning to hardware, a potential future project is to add actual
1402 hardware garbage collection to the GC component. At the moment the
1403 design does no garbage collection, the processor only has 4096 words
1404 of RAM available, and evaluation is quite memory-inefficient. Adding
1405 garbage collection would make more complicated programs possible
1406 despite the limited RAM. One way to implement this is to add a small
1407 microprocessor to the design which is preprogrammed with a garbage
1408 collection algorithm (simple choices are a stop-and-copy collector or
1409 Cheney's algorithm). When the free space pointer gets too high, the
1410 EVAL and GC are stopped, and the new microprocessor is started up so
1411 it can perform garbage collection, compact the memory, and update the
1412 free space pointer. The EVAL/GC registers would have to be updated as
1413 well. Then the microprocessor can be stopped, EVAL and GC be started
1414 up again and evaluation can continue. Note that a non-moving collector
1415 is not sufficient: the GC allocates new words only at the very end of
1416 the memory, so it cannot benefit from free ``gaps'' in the middle of
1417 the memory.
1418
1419 \begin{thebibliography}{99}
1420 \bibitem{lambda} Design of LISP-Based Processors (1979), available at
1421 \url{http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-514.pdf}
1422 \bibitem{osdvu} \url{https://opencores.org/project/osdvu}
1423 \bibitem{gtkwave} \url{http://gtkwave.sourceforge.net/}
1424 \bibitem{icestorm} \url{http://www.clifford.at/icestorm/}
1425 \end{thebibliography}
1426 \end{document}
This page took 0.057595 seconds and 4 git commands to generate.