]>
Commit | Line | Data |
---|---|---|
1 | \documentclass[a4paper,12pt]{report} | |
2 | \usepackage{color} | |
3 | \usepackage[cm]{fullpage} | |
4 | \usepackage{graphicx} | |
5 | \usepackage{hyperref} | |
6 | \usepackage{listings} | |
7 | \usepackage{pdfpages} | |
8 | ||
9 | \title{YULE: Yet Unnamed Lisp Evaluator} | |
10 | \author{Marius Gavrilescu} | |
11 | \date{May 2018} | |
12 | ||
13 | \begin{document} | |
14 | % title page does not need anchors | |
15 | \hypersetup{pageanchor=false} | |
16 | \maketitle | |
17 | \hypersetup{pageanchor=true} | |
18 | \lstset{language=Perl, frame=leftline, framexleftmargin=5pt, | |
19 | showstringspaces=false, showspaces=false, breaklines=true, | |
20 | postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space}} | |
21 | ||
22 | \chapter*{Abstract} | |
23 | Lisp machines are computers designed to run Lisp programs efficiently. | |
24 | They were first developed in the 1970s, and they have special hardware | |
25 | support for Lisp. This usually comprises a tagged architecture (each | |
26 | pointer has a ``type'' field), garbage collection implemented in | |
27 | hardware, and an instruction set that lends itself well to evaluating | |
28 | Lisp expressions. | |
29 | ||
30 | Between 1975 and 1980, Guy L. Steele and Gerald Jay Sussman wrote a | |
31 | series of memos on lambda calculus, the Lisp language and its | |
32 | implementation known collectively as the ``Lambda Papers''. One of | |
33 | these papers, ``Design of LISP-Based Processors (1979)'' describes how | |
34 | to implement a Lisp machine processor. If this processor is connected | |
35 | to a RAM that contains a program, it executes that program and places | |
36 | the result back into the RAM. | |
37 | ||
38 | The aim of YULE is to take the design from the 1979, implement it | |
39 | today using modern technology and techniques, and bundle it up in a | |
40 | form that is both easy to use and extend. By going through this | |
41 | modernisation process, future developers can easily integrate the Lisp | |
42 | processor into their own design, modifying it as they please. | |
43 | ||
44 | The end result of YULE is a small device that can be plugged into a | |
45 | USB port, and an associated computer program. Running the program | |
46 | presents a simplified Lisp listener. The user can type a Lisp | |
47 | expression into the listener, and the listener will respond with the | |
48 | value of the expression. Under the hood, the listener evaluates Lisp | |
49 | by sending the expression over USB to the small device and getting | |
50 | back the result. | |
51 | ||
52 | YULE comes with the design files for the hardware, and source code for | |
53 | the software. All of these are released under a free software license | |
54 | and published online, which makes it very easy (from both a practical | |
55 | and a legal standpoint) for future developers to use in their own | |
56 | projects and modify to fit their purposes better. | |
57 | \tableofcontents | |
58 | ||
59 | \chapter{Introduction} | |
60 | \section{Motivation} | |
61 | Lisp machine processors are quite unlike modern processors. Modern | |
62 | instruction sets are designed for imperative programming. Each | |
63 | instruction makes the processor execute some operation, such as | |
64 | loading a register or storing a value in memory. In contrast, a Lisp | |
65 | machine processor reads an expression from memory, determines its | |
66 | type, and recursively evaluates it. This operation lends itself to | |
67 | declarative programming, where the programmer declares an arbitrarly | |
68 | complicated expression and asks the computer to evaluate it. | |
69 | ||
70 | Another common feature of Lisp machine processors is the use of tagged | |
71 | pointers. Each memory location is a tagged union of two elements: the | |
72 | value itself, and a tag representing the type of that data. Tagged | |
73 | architectures are not common today, although some programs still | |
74 | use tagged pointers by exploiting pointer alignment (if every pointer | |
75 | is a multiple of 8, then the bottom 3 bits can be used to store a | |
76 | tag). These tags can be used to store an object's reference count (as | |
77 | in Objective-C on certain versions of iOS), or information about a | |
78 | data structure (as in certain implementations of kd-trees). | |
79 | ||
80 | The ``Design of LISP-Based Processors (1979)'' paper by Guy L. Steele | |
81 | and Gerald Jay Sussman \cite{lambda} presents the design of a | |
82 | relatively simple Lisp machine processor, which is made from two | |
83 | read-only memories, two buses, and several registers. If this | |
84 | processor is connected to a RAM containing a Lisp expression in a | |
85 | certain binary format, the processor will evaluate the expression and | |
86 | place the result back in memory. | |
87 | ||
88 | This processor features both a tagged architecture (the top 3 bits of | |
89 | each word in memory represent the value's type) and the declarative | |
90 | nature mentioned above. It is therefore interesting to study and to | |
91 | bring this processor into today's world by using technology such as | |
92 | hardware description languages (that allow us to describe a circuit | |
93 | formally) and FPGAs (that allow us to synthesise any circuit quickly | |
94 | and at low cost). By building a HDL implementation of the processor | |
95 | not only can it be used easily on its own (by synthesising it to a | |
96 | FPGA and connecting it to RAM), but a future circuit designer can take | |
97 | this processor and include it in their more complicated design. | |
98 | ||
99 | The processor presented in the original paper is a somewhat barebones | |
100 | implementation of Lisp. Arithmetic primitives are not available, nor | |
101 | are there any input or output routines. Due to time constraints the | |
102 | paper's writers have not included a equality primitive. Additionally | |
103 | the ATOM primitive has a bug that makes it consider NIL not atomic. | |
104 | All these deficiencies can be rectified given time and effort, but it | |
105 | is difficult for one to fix them given only a paper describing the | |
106 | processor. Having a HDL implementation of the processor with a proper | |
107 | test suite allows a future circuit designer to more easily make | |
108 | changes to the processor while using the test suite to prevent | |
109 | regressions and using a FPGA to quickly obtain a hardware version of | |
110 | the changed processor that can be manually tested and used in any | |
111 | desired application. | |
112 | ||
113 | \section{Challenges} | |
114 | While \cite{lambda} is part of the well-known Lambda Papers series and | |
115 | has been read by many over the years, there are no published | |
116 | implementations of the processor described in the paper. Therefore | |
117 | \cite{lambda} was the only source used in the building of this | |
118 | project, and several of the limitations of the processor were not easy | |
119 | to see before the implementation phase was well underway. Furthermore, | |
120 | the original paper has several shortcomings. | |
121 | ||
122 | For one, while the microcode (the values of the two read-only | |
123 | memories) is provided in both source and compiled form, the microcode | |
124 | compiler is not provided. This makes it difficult to alter the | |
125 | microcode to fix bugs or introduce new instructions. One needs to | |
126 | either work on the already-compiled code (hard and error-prone) or | |
127 | reimplement the microcode compiler, which takes a reasonable amount of | |
128 | effort. | |
129 | ||
130 | Moreover, the paper does not concern itself at all with how the user | |
131 | gets a program into memory or how the user gets the result back. This | |
132 | makes the processor as-described very difficult to use. | |
133 | ||
134 | To implement the processor in Verilog, no source other than the paper | |
135 | was used. There were no previous implementations of the processor that | |
136 | could be consulted to resolve unclarities in the original paper. | |
137 | ||
138 | In addition, there was no software available for interacting with the | |
139 | processor. A compiler and assembler had to be written from scratch, | |
140 | and so the processor could not be easily debugged until after the | |
141 | software was functional. | |
142 | ||
143 | \section{Contribution} | |
144 | YULE is a modern-day implementation of a Lisp machine processor using | |
145 | the ideas and techniques presented in \cite{lambda}. | |
146 | ||
147 | The main contribution of YULE is a Verilog implementation of the | |
148 | processor which allows future circuit designers to use it in their own | |
149 | design and modify it as they please. | |
150 | ||
151 | As the processor as described has no way to input a program or extract | |
152 | a result from memory, YULE includes more circuitry around the | |
153 | processor that lets it communicate with the external word. This | |
154 | consists of a serial UART and two modules, a reader and a writer. When | |
155 | the processor is idle, the reader waits for a program to be sent over | |
156 | the UART. Once a program has been received, it is placed into memory | |
157 | and the processor starts to evaluate it. Once evaluation is done, the | |
158 | writer transmits the new contents of the memory over the UART. | |
159 | ||
160 | Besides the hardware chip, YULE also includes three pieces of | |
161 | software: | |
162 | ||
163 | \begin{itemize} | |
164 | \item a compiler, which takes Lisp source code and produces an | |
165 | intermediate representation of the program. | |
166 | \item an assembler, which takes a program in the intermediate | |
167 | representation and produces the binary representation of the program | |
168 | that the processor can understand. | |
169 | \item a REPL, in which users can type expressions. When an expression | |
170 | is submitted, the REPL compiles it, assembles it, and sends it to | |
171 | the processor. The processor evaluates the expression and sends back | |
172 | a memory dump. The REPL then reads the memory dump, interprets it, | |
173 | and prints the value of the expression in a human-readable format. | |
174 | \end{itemize} | |
175 | ||
176 | The REPL is therefore the entry point of YULE as a whole. An end-user | |
177 | who does not intend to extend YULE can plug in an iCEstick with the | |
178 | processor, and run the REPL. Then the user can write expressions in | |
179 | the REPL and get their values back. Besides being the entry point, the | |
180 | REPL serves as a convenient way to test that the entire system is | |
181 | working. An expression typed in the REPL goes through the compiler and | |
182 | assembler, and then is sent to the processor. It goes through the | |
183 | three stages of the processor (reading the expression into memory, | |
184 | evaluating the contents of the memory, writing a memory dump). | |
185 | ||
186 | The hardware and software components of YULE have been tested to | |
187 | ensure they function correctly. There is an automated test suite | |
188 | containing 48 unit tests for the compiler and assembler, and 21 | |
189 | functional tests that engage all parts of the system (both the | |
190 | software and the hardware). The unit tests ensure that the various | |
191 | small parts of the compiler and assembler work properly, while the | |
192 | functional tests consist of pairs of short Lisp expressions and their | |
193 | correct values. The expressions are written to the REPL, and the | |
194 | REPL's output is compared to the correct values. | |
195 | ||
196 | To make it easier for future circuit designers and programmers to use | |
197 | this project, YULE has been released under an open source license. | |
198 | Prospective users can find all the source files (for both hardware and | |
199 | software) online, and are free to use, modify, and redistribute them | |
200 | under the same terms as Perl itself. The files can be accessed at this | |
201 | URL: \url{https://git.ieval.ro/?p=yule.git}. | |
202 | ||
203 | \section{Layout of this report} | |
204 | This report begins by describing the background of YULE, the | |
205 | technologies that made this project possible and a more in-depth | |
206 | description of the 1979 paper. Then the components that YULE comprises | |
207 | are examined in turn, followed by a chapter on how everything was | |
208 | tested, which includes a worked example of the system in action. The | |
209 | report ends with a list of ideas for future work based on YULE, and a | |
210 | section on reflections. | |
211 | ||
212 | \chapter{Background} | |
213 | ||
214 | \section{Field-programmable gate array} | |
215 | An FPGA is a configurable integrated circuit. A designer can define a | |
216 | circuit using a hardware description language (explained in the next | |
217 | section) and compile it to some binary format called a configuration. | |
218 | This configuration describes how the different components of the FPGA | |
219 | should be connected to each other. The configuration can then be | |
220 | uploaded to the FPGA, and the FPGA will connect its internal | |
221 | components as requested. | |
222 | ||
223 | An FPGA can therefore replace any digital custom circuit (ASIC). While | |
224 | an FPGA consumes more energy, uses more space, and runs at slower | |
225 | clock rates than an ASIC, the FPGA permits rapid prototyping: the time | |
226 | between finishing to write a design and having a working part is mere | |
227 | seconds (compilation time + uploading the configuration to the FPGA), | |
228 | and if a bug is discovered the buggy part need not be thrown away. The | |
229 | design can be modified and the fixed configuration can be uploaded to | |
230 | the FPGA. | |
231 | ||
232 | The hardware aspect of YULE is designed to run on the Lattice | |
233 | iCE40HX-1k FPGA, but with small modifications it can run on other | |
234 | FPGAs as well. | |
235 | ||
236 | \subsection*{iCEstick} | |
237 | \begin{figure} | |
238 | \centering | |
239 | \includegraphics[height=4cm]{icestick.png} | |
240 | \caption{Lattice iCEstick Evaluation Kit} | |
241 | \end{figure} | |
242 | ||
243 | The Lattice iCEstick Evaluation Kit is a development board for the | |
244 | iCE40HX-1k FPGA. It contains the FPGA, a USB connector, a FTDI chip | |
245 | and 5 LEDs. The form factor is very convenient. It is similar to a USB | |
246 | memory stick (albeit bigger) that can be plugged into a computer, | |
247 | avoiding the need for jumper wires and GPIO that traditional | |
248 | development boards require. | |
249 | ||
250 | The FTDI chip implements the USB protocol, allowing the FPGA to be | |
251 | programmed by the computer and letting the FPGA talk to the computer | |
252 | using a very simple serial UART interface on the FPGA side, and a | |
253 | standard USB serial port on the computer side. This makes | |
254 | communication very easy. The FPGA can use any off-the-shelf UART | |
255 | implementation (this project uses \texttt{osdvu} from | |
256 | Opencores \cite{osdvu}), and the computer can use standard serial | |
257 | console tools such as \texttt{minicom}. | |
258 | ||
259 | \section{Verilog} | |
260 | Verilog is a hardware description language. It is used to model | |
261 | digital circuits at register-transfer level of abstraction. In Verilog | |
262 | one defines several modules, with one of them (the ``toplevel'' | |
263 | module) representing the entire design. Each module has several inputs | |
264 | and outputs, can contain internal wires, can contain instances of | |
265 | other modules, and can contain combinational and sequential logic | |
266 | blocks that drive internal wires and outputs. | |
267 | ||
268 | Verilog has many more complicated features that are not used in YULE. | |
269 | These include tristate buffers, delays, functions, tasks, loops, and | |
270 | strength declarations. | |
271 | ||
272 | A subset of Verilog can be \textit{synthesised}; that is converted to | |
273 | a netlist (a list of transistors or FPGA components used and | |
274 | connections between them) that can then be used to build an ASIC | |
275 | (custom integrated circuit), or can be further converted to the | |
276 | internal representation of the FPGA and uploaded to the FPGA. The FPGA | |
277 | workflow and the programs that are part of it are described in the | |
278 | next section. | |
279 | ||
280 | A verilog project can also be run through a simulator, which simulates | |
281 | the design clock edge by clock edge and records the values on each | |
282 | wire in the design. This record can later be read in and visualised | |
283 | with a wave viewer, which allows a circuit designer to better | |
284 | understand why the circuit is not working correctly. One such program, | |
285 | GTKWave \cite{gtkwave}, was used to debug YULE. | |
286 | ||
287 | \section{icestorm} | |
288 | Project IceStorm \cite{icestorm} is a set of open source reverse | |
289 | engineered tools for programming Lattice iCE40 FPGAs. There exists a | |
290 | fully open source Verilog-to-Bitstream flow which comprises: | |
291 | ||
292 | \begin{enumerate} | |
293 | \item Yosys, a Verilog synthetizer. | |
294 | \item Arachne-pnr, a tool that implements placing and routing for | |
295 | iCE40 FPGAs. | |
296 | \item icepack, which converts Arachne-pnr's output to iCE40 bitstream. | |
297 | \item iceprog, which programs the iCEstick with the output of icepack. | |
298 | \end{enumerate} | |
299 | ||
300 | \section{Design of LISP-Based Processors (1979)} | |
301 | Design of LISP-Based Processors (1979) \cite{lambda} is one of the | |
302 | ``Lambda papers''; a series of memos by Sussman and Steele about the | |
303 | Lisp language and its implementation. The paper describes SIMPLE, a | |
304 | microprocessor designed to run Lisp. | |
305 | ||
306 | This processor is made of two read-only memories called EVAL and GC. | |
307 | Each read-only memory is paired with state registers to form a state | |
308 | machine. The contents of the state registers are fed into the | |
309 | read-only memory, and the output of the read-only memory determines | |
310 | the next state and other signals. | |
311 | ||
312 | Each state machine also has an internal bus (called the E and G buses | |
313 | respectively) and several registers connected to that bus. Each of | |
314 | these registers stores a tagged pointer, which is an 11-bit value (the | |
315 | top 3 bits represent the type, and the bottom 8 bytes represent the | |
316 | value). | |
317 | ||
318 | The signals coming out of each state machine can cause a register to | |
319 | be read onto the machine's internal bus, a register to be loaded from | |
320 | the internal bus, or write a constant value onto the internal bus. | |
321 | ||
322 | The GC state machine represents the processor's memory system. It is | |
323 | connected to an external RAM, which it controls through several | |
324 | signals. It knows how to execute certain ``GC operations'', which | |
325 | represent common Lisp memory management actions. Examples of these are | |
326 | \texttt{CAR} and \texttt{CDR} (``given a pointer to a cons cell, | |
327 | return either the CAR or the CDR of that cons cell'') or \texttt{CONS} | |
328 | (``given two pointers, allocate a new cons cell, copy them into it, | |
329 | and return a pointer to the new cons cell''). The existence of the GC | |
330 | isolates the rest of the processor from the particularities of the | |
331 | memory used. | |
332 | ||
333 | The EVAL state machine evaluates Lisp expressions. Signals coming out | |
334 | of it control the GC by selecting what operation the GC should perform | |
335 | and connecting the E and G buses to supply arguments to the GC and | |
336 | read results back. The EVAL reads expressions from memory via the GC | |
337 | and evaluates them, putting results back into memory. | |
338 | ||
339 | The overall operation of the processor is to read an expression from | |
340 | the magic location ``5'' in memory, evaluate it, and place the result | |
341 | into the magic location ``6'' in memory. Other memory locations are | |
342 | used to store subexpressions of the input expression, and to store any | |
343 | intermediate results. Despite its name, GC does not perform any | |
344 | garbage collection, so at the end of the processor's operation the | |
345 | memory will still contain all intermediate results produced during | |
346 | evaluation. | |
347 | ||
348 | The processor is intended as a proof of concept and not a | |
349 | production-ready hardware implementation of Lisp. Therefore it has | |
350 | several important limitations, for example there is no primitive | |
351 | operation that implements equality of objects (making it impossible to | |
352 | distinguish between symbols), there are no primitive arithmetic | |
353 | operations (so one needs to use Church numerals if numbers are | |
354 | needed), and the processor includes a known bug (``the ATOM bug''). | |
355 | The bug is that the ATOM primitive considers NIL to be non-atomic, | |
356 | which is incorrect. | |
357 | ||
358 | The Lisp expressions mentioned here are in textual form (Lisp source | |
359 | code), but instead in a special tagged pointer format. Each memory | |
360 | value is a union of two values: a 8-bit pointer and a 3-bit type. We | |
361 | can use pairs of consecutive memory locations to represent cons cells | |
362 | and build linked lists out of them. If a list is at location X, that | |
363 | means that (X, X+1) is the first cons cell of the list, with X+1 being | |
364 | the CAR (the value of the first element) and X being the CDR (a | |
365 | pointer to the next cons cell). A discussion of the different ways to | |
366 | represent Lisp expressions (source code, this tagged pointer format, | |
367 | and an intermediate representation) can be found in \ref{sec:formats}. | |
368 | ||
369 | The next page shows a diagram of how the processor in the paper is | |
370 | structured. The two read-only memories can be seen, with their | |
371 | registers, state registers, and the interconnection between the E and | |
372 | G buses. | |
373 | ||
374 | \begin{figure} | |
375 | \centering\includegraphics[width=18.5cm]{AIM-514-design.pdf} | |
376 | \caption{Design diagram of processor from \cite{lambda}} | |
377 | \end{figure} | |
378 | ||
379 | \chapter{A Lisp Machine processor on a USB drive} | |
380 | The project comprises a hardware part written in Verilog and a | |
381 | software part written in Perl. | |
382 | ||
383 | The hardware part is a Lisp machine processor with associated | |
384 | circuitry that can read a program from the iCEstick UART and that can | |
385 | write the result of running a program to the UART. The normal | |
386 | operation of this part is to: | |
387 | ||
388 | \begin{enumerate} | |
389 | \item Wait for a program to be supplied via UART | |
390 | \item Run the program on the processor | |
391 | \item Write a memory dump to the UART | |
392 | \item Go back to step 1 | |
393 | \end{enumerate} | |
394 | ||
395 | The software part includes: | |
396 | ||
397 | \begin{itemize} | |
398 | \item A compiler, which takes Lisp source code and puts it in the | |
399 | linked list format understood by the SIMPLE processor | |
400 | \item An assembler, which takes the result of the compiler and | |
401 | converts it to the SIMPLE in-memory binary format. | |
402 | \item A REPL, which repeatedly reads a program from the user, compiles | |
403 | it, assembles it, sends it to YULE, reads the memory dump from YULE, | |
404 | and prints the result for the user to see. | |
405 | \end{itemize} | |
406 | ||
407 | \clearpage | |
408 | \section{Hardware} | |
409 | The hardware part is built from 8 modules: | |
410 | ||
411 | \begin{enumerate} | |
412 | \item The \textbf{READER}, which implements step 1 of the normal operation described above. | |
413 | \item The \textbf{WRITER}, which implements step 3 of the normal operation described above. | |
414 | \item The \textbf{GCRAM}, a single-port RAM. | |
415 | \item The \textbf{GC}, corresponding to the GC PLA in the original implementation. | |
416 | \item The \textbf{EVAL}, corresponding to the EVAL PLA in the original implementation. | |
417 | \item The \textbf{CTRL}, which advances the operation from one step to the next one. | |
418 | \item The \textbf{UART}, which implements transmission and reception of bytes via UART. | |
419 | \item The \textbf{CPU}, which instantiates every module above and wires them together. | |
420 | \end{enumerate} | |
421 | ||
422 | \subsection{Clocks} | |
423 | To avoid timing problems, a single 48 MHz clock is used. The iCEstick | |
424 | has a 12 MHz oscillator, and the phase-locked loop circuitry on the | |
425 | FPGA multiplies this signal by 4. | |
426 | ||
427 | Every flip-flop updates on the positive edge of the clock, except for | |
428 | the RAM which updates on the negative edge. This way the rest of the | |
429 | circuit perceives the RAM to be combinational (it ``updates | |
430 | immediately''): if on a positive edge the RAM's address input changes, | |
431 | the RAM will update its output on the following negative edge and the | |
432 | changed output will be visible on the next positive edge. If the RAM | |
433 | updated on a positive edge, then if the input changed on positive edge | |
434 | $i$ the changed output will only be visible on positive edge $i+2$. | |
435 | ||
436 | The design is intended to be run on a FPGA, which has a fixed clock | |
437 | tree. This typically consists of a small number of special nets called | |
438 | global buffers that can distribute signals with minimal skew to every | |
439 | logic cell. The FPGA on the iCEstick has 8 such global buffers. | |
440 | ||
441 | ||
442 | There are 4 components in the design (GC, EVAL, READER, WRITER) that | |
443 | need to be enabled/disabled at different times. One way to achieve | |
444 | this is to AND the master 48 MHz clock with several signals, thus | |
445 | creating a separate clock for each component. This is known as | |
446 | \textit{clock gating}. However, on a FPGA clock gating involves | |
447 | routing the clock from a global buffer to a logic cell and then back | |
448 | to another global buffer. This can introduce glitches in the clock and | |
449 | would delay each of the clocks by some amount (dependent on the | |
450 | signals it is ANDed with). It is therefore possible that the 4 output | |
451 | clocks would not be aligned, causing problems when data is | |
452 | communicated by one component to another. As such, clock gating is not | |
453 | recommended on FPGAs. | |
454 | ||
455 | \begin{figure} | |
456 | \centering\includegraphics[height=3cm]{gating.eps} | |
457 | \caption{Gated clock on the left, clock enable on the right} | |
458 | \end{figure} | |
459 | ||
460 | To avoid this problem, the design used here does not use clock gating | |
461 | and instead uses 4 clock enable signals supplied by the controller to | |
462 | each of the GC, EVAL, READER and WRITER. At any point the clock is | |
463 | enabled for exactly one of the READER, GC, and WRITER. The EVAL clock | |
464 | is enabled if and only if the GC clock is enabled and the GC's | |
465 | \texttt{step\_eval} output is high. | |
466 | ||
467 | \clearpage | |
468 | \subsection{GC} | |
469 | \begin{figure} | |
470 | \centering\includegraphics[height=4.88cm]{gc.eps} | |
471 | \caption{GC module} | |
472 | \end{figure} | |
473 | ||
474 | The GC module is connected to the GCRAM (via the \texttt{ram\_do, | |
475 | ram\_we, ram\_addr, ram\_di} signals), to the EVAL (via the | |
476 | \texttt{Ein, Eout, gcop, step\_eval, conn\_et, conn\_ea} signals) and | |
477 | to the controller. It also provides the value of its P register (the | |
478 | free storage pointer) to the WRITER, so the writer can know when to | |
479 | stop dumping the memory. | |
480 | ||
481 | It is implemented as a combinational ROM, some combinational logic to | |
482 | drive the internal G bus, sequential logic to advance to the next | |
483 | state respecting dispatch rules, and sequential logic to update | |
484 | registers. The clock enable signal acts as an active low synchronous | |
485 | reset: when the clock is disabled, the state register is set to the | |
486 | initial state. | |
487 | ||
488 | This module represents one half of the original design presented in | |
489 | \cite{lambda}, and the ROM's contents are those from the paper with a | |
490 | bugfi (the original design sets the ADR line low in two states when it | |
491 | should be high). | |
492 | ||
493 | \subsection{EVAL} | |
494 | \begin{figure} | |
495 | \centering\includegraphics[height=3.29cm]{eval.eps} | |
496 | \caption{EVAL module} | |
497 | \end{figure} | |
498 | ||
499 | The EVAL module is connected to the GC (via the \texttt{conn\_et, | |
500 | conn\_ea, Ein, Eout, gcop} signals) and to the controller. | |
501 | ||
502 | The implementation of the EVAL is very similar to that of the GC. | |
503 | There is a combinational ROM, combinational logic to drive the E bus, | |
504 | sequential logic to advance to the next state respecting dispatch | |
505 | rules, and sequential logic to update registers. A synchronous reset | |
506 | is available. When the reset line is held high for one clock cycle, | |
507 | the state register is set to the initial state. It is necessary to | |
508 | separate the reset from the clock enable because the GC often pauses | |
509 | the EVAL to do work and the EVAL should not be reset every time this | |
510 | happens. | |
511 | ||
512 | This module represents the other half of the original design presented | |
513 | in \cite{lambda}, and the ROM's contents are the same as those in the | |
514 | paper except that the ROM is widened by one bit to indicate whether | |
515 | the EVAL is finished evaluating the expression. | |
516 | ||
517 | \clearpage | |
518 | \subsection{Writer} | |
519 | \begin{figure} | |
520 | \centering\includegraphics[height=2.26cm]{writer.eps} | |
521 | \caption{WRITER module} | |
522 | \end{figure} | |
523 | ||
524 | The WRITER module is connected to the UART (via the \texttt{tx\_busy, | |
525 | tx\_signal, tx\_byte} signals), the GC (via the \texttt{P} signal), | |
526 | the RAM (via the \texttt{ram\_addr, ram\_do} signals), and to the | |
527 | controller. | |
528 | ||
529 | Its purpose is to dump the memory to the UART starting with position 4 | |
530 | (since positions 0, 1, 2, 3 are constant) and finishing with position | |
531 | $P - 1$, the last used cell in memory. It is implemented as a state | |
532 | machine with 7 states and an extra register (\texttt{current\_index}) | |
533 | that stores the current index in the memory. | |
534 | ||
535 | The memory is 16-bit wide, but the UART sends and receives bytes. The | |
536 | writer dumps memory word in network byte order (big-endian), meaning | |
537 | the significant byte is sent before the least significant byte. | |
538 | ||
539 | The states are, in order: | |
540 | \begin{description} | |
541 | \item[START] which initializes \texttt{current\_index} to 4 | |
542 | \item[WRITE1\_WAIT] which waits for \texttt{tx\_busy} to become low | |
543 | \item[WRITE1] in which transmission of the upper byte starts | |
544 | \item[WRITE2\_WAIT] which waits for \texttt{tx\_busy} to become low | |
545 | \item[WRITE2] in which transmission of the lower byte starts | |
546 | \item[INCREMENT] in which \texttt{current\_index} is incremented, | |
547 | jumping back to WRITE1\_WAIT if it is still lower than | |
548 | \texttt{freeptr}. | |
549 | \item[FINISHED] in which \texttt{finished} is driven high | |
550 | \end{description} | |
551 | ||
552 | This module is not part of the original design in \cite{lambda}, as | |
553 | that paper did not concern itself with input and output. | |
554 | ||
555 | \clearpage | |
556 | \subsection{Reader} | |
557 | \begin{figure} | |
558 | \centering\includegraphics[height=2.22cm]{reader.eps} | |
559 | \caption{READER module} | |
560 | \end{figure} | |
561 | ||
562 | The READER module is connected to the UART (via the | |
563 | \texttt{rx\_signal, rx\_byte} signals), the RAM (via the | |
564 | \texttt{ram\_we, ram\_addr, ram\_di} signals), and to the controller. | |
565 | ||
566 | Its purpose is to read a program from the UART and overwrite the GCRAM | |
567 | with this program. It is implemented as a state machine with two extra | |
568 | registers (\texttt{total\_words} and \texttt{current\_index}) to store | |
569 | the size of the incoming program and the current position within it. | |
570 | ||
571 | The reader expects to receive a 13-bit \texttt{length} value, followed | |
572 | by \texttt{length} 16-bit values that represent the desired memory | |
573 | contents. Each of these values is sent in network byte order | |
574 | (big-endian), meaning the most significant byte is sent before the | |
575 | least significant byte. | |
576 | ||
577 | The states are, in order: | |
578 | \begin{description} | |
579 | \item[IDLE] which waits for \texttt{rx\_signal}, puts the read byte at the top of \texttt{total\_words}. | |
580 | \item[LENGTH] which waits for \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{total\_words} | |
581 | \item[READ1] which waits for the \texttt{rx\_signal}, puts the read byte at the top of \texttt{ram\_di} and increments \texttt{current\_index} | |
582 | \item[READ2] which waits for the \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{ram\_di} | |
583 | \item[WRITE] in which \texttt{ram\_we} is high, jumping back to READ1 if \texttt{current\_index + 1 < total\_words} | |
584 | \item[FINISHED] in which \texttt{finished} is high | |
585 | \end{description} | |
586 | ||
587 | This module is not part of the original design in \cite{lambda}, as | |
588 | that paper did not concern itself with input and output. | |
589 | ||
590 | \clearpage | |
591 | \subsection{Putting the hardware together} | |
592 | The components described above are managed by the controller, which | |
593 | provides them with clock enable signals. The controller is a state | |
594 | machine with three states: | |
595 | \begin{description} | |
596 | \item[STATE\_READ] in which only the READER clock is enabled | |
597 | \item[STATE\_RUN] in which the GC and EVAL clocks are enabled | |
598 | \item[STATE\_WRITE] in which only the WRITER clock is enabled | |
599 | \end{description} | |
600 | ||
601 | In STATE\_READ and STATE\_WRITE, the EVAL's reset line is high. | |
602 | ||
603 | Whenver the component corresponding to the current state indicates it | |
604 | is finished, the controller advances to the next state. | |
605 | ||
606 | The controller also includes three multiplexers that control which | |
607 | lines are connected to the GCRAM. If the READER is active, all three | |
608 | GCRAM lines are connected to the READER. If the WRITER is active, the | |
609 | address line is connected to the writer (and the other lines are | |
610 | connected to the GC). If the GC is active, all lines are connected to | |
611 | the GC. | |
612 | ||
613 | The controller also includes logic to drive the 5 LEDs on the | |
614 | iCEstick. Three LEDs indicate the current state, while the other two | |
615 | indicate if the UART is currently transmitting or receiving. | |
616 | ||
617 | \section{Representing a Lisp program} | |
618 | \label{sec:formats} | |
619 | The processor described above evaluates a Lisp program received via | |
620 | UART. This section talks about three representations of such a | |
621 | program: the Lisp source code (a S-expression), the intermediate form | |
622 | (also a S-expression), and the in-memory form (a binary string). | |
623 | ||
624 | The first representation is obtained by parsing the user input, the | |
625 | compiler converts this into the second, and the assembler converts the | |
626 | second into the third. | |
627 | ||
628 | \subsection{Lisp source code} | |
629 | Recall that a S-expression is either an atom or a cons cell. | |
630 | ||
631 | An atom is a sequence of letters, digits, and certain symbols. There | |
632 | are two special atoms named \texttt{T} and \texttt{NIL} representing | |
633 | the boolean true and false respectively. | |
634 | ||
635 | A cons cell is a pair of two S-expressions called the ``car'' and the | |
636 | ``cdr''. We write them as \texttt{(car . cdr)}, for example \texttt{(5 | |
637 | . 7)} is a cons cell with car \texttt{5} and cdr \texttt{7}. | |
638 | ||
639 | We can use cons cells to represent linked lists. A linked list is a | |
640 | sequence of cons cells where the cdr of one cell is the next cell, and | |
641 | the cdr of the last cell is \texttt{NIL}. For example the list | |
642 | \texttt{1,2,3} can be written as \texttt{(1 . (2 . (3 . NIL)))}. | |
643 | We can also write this list more concisely as \texttt{(1 2 3)}, and we | |
644 | write \texttt{()} to mean the atom \texttt{NIL}. | |
645 | ||
646 | Using these syntax elements we can encode several types of logical | |
647 | expressions: | |
648 | ||
649 | \begin{itemize} | |
650 | \item Numbers are encoded as numeric atoms, e.g. \texttt{5} | |
651 | \item Variables are encoded as non-numeric atoms, e.g. \texttt{my-var} | |
652 | \item Conditional statements are encoded as 4-element lists, e.g. | |
653 | \texttt{(if is-odd 5 6)} which has value 5 if the variable | |
654 | \texttt{is-odd} is not \texttt{NIL} and value 6 otherwise | |
655 | \item Symbols are encoded as non-numeric atoms preceded by a single | |
656 | quote, e.g. \texttt{'my-symbol}. This can also be written | |
657 | \texttt{(quote my-symbol)} | |
658 | \item Constant lists are encoded as lists preceded by a single quote, | |
659 | e.g. \texttt{'(1 2 (3 4 5))}. This can also be written | |
660 | \texttt{(quote (1 2 (3 4 5)))} | |
661 | \item Functions (lambda abstractions) are encoded as 4-element lists, | |
662 | e.g. \texttt{(lambda first (a b) a)} in which the second element is | |
663 | the function name, the third element is a list of function | |
664 | arguments, and the fourth element is the function body. | |
665 | \item Function calls are encoded as lists, with the function as the | |
666 | first element and the arguments as the other elements, e.g. | |
667 | \texttt{(atom 5)} calls the function \texttt{atom} with argument | |
668 | \texttt{5}. | |
669 | \end{itemize} | |
670 | ||
671 | \subsection{In-memory representation} | |
672 | The memory is an array of 16-bit integer values. We interpret these as | |
673 | tagged pointers: the top 3 bits indicate the ``type'' of this pointer, | |
674 | while the other 13 bits represent another memory location. A pointer | |
675 | therefore has a type and a value. | |
676 | ||
677 | We represent each Lisp expression as a tagged pointer. To evaluate it, | |
678 | the processor looks at the type of the pointer. Here is what a pointer | |
679 | of value $v$ means based on its type. We use the syntax $mem[a]$ to | |
680 | mean ``the contents of the memory location $a$'' (that is, a tagged | |
681 | pointer) and $memval[a]$ to mean ``the value at the memory location | |
682 | $a$'' (that is, the value of the tagged pointer $mem[a]$). | |
683 | ||
684 | \begin{enumerate} | |
685 | \setcounter{enumi}{-1} | |
686 | \item is a constant list, whose cdr is $mem[v]$ and car is $mem[v+1]$. | |
687 | \item is a constant symbol. | |
688 | \item is a variable reference, where $v$ indicates the position in the | |
689 | environment of the referenced variable. | |
690 | \item is a constant closure, that is the result of evaluating a lambda | |
691 | abstraction. Values of this type do not normally appear in the input | |
692 | code, but instead are created during evaluation of other forms. | |
693 | \item is a procedure (lambda abstraction), whose body is $mem[v]$. | |
694 | \item is a conditional, where $mem[v+1]$ is the condition, | |
695 | $mem[memval[v]+1]$ is the ``then'' branch, and $mem[memval[v]]$ is the | |
696 | ``else'' branch. | |
697 | \item is a procedure call, which points to a special kind of linked | |
698 | list. The CDR of each cell (other than the last one) points to the | |
699 | next cell and has type 0, while the CDR of the last cell has value 0 | |
700 | and its type indicates the primitive to be called. The CARs of these | |
701 | cells indicate the arguments for the procedure, in reverse order. | |
702 | ||
703 | To call a user-defined procedure, we use the primitive | |
704 | \texttt{funcall} and put the user-defined procedure as the CAR of | |
705 | the last cell (in other words, the user-defined procedure is the | |
706 | first argument to \texttt{funcall}). | |
707 | \item is a quoted constant, representing the object at $mem[v]$. | |
708 | \end{enumerate} | |
709 | ||
710 | The booleans \texttt{NIL} and \texttt{T} are compiled to a list | |
711 | pointing to 0, and a symbol with value 2 respectively. | |
712 | ||
713 | The memory has a peculiar fixed layout. Locations 0 and 1 are the CDR | |
714 | and CAR of NIL, locations 2 and 3 are the CDR and CAR of T, location 4 | |
715 | holds the first unused location in memory (this tells the processor | |
716 | where it can start to place intermediate expressions), location 5 | |
717 | holds the expression to be evaluated, and location 6 is where the | |
718 | processor will place the value of the expression. | |
719 | ||
720 | \subsection{The intermediate form} | |
721 | To facilitate conversion from the former format to the latter, an | |
722 | intermediate form is used. Here we represent tagged pointers as lists | |
723 | of one of three types: | |
724 | ||
725 | \begin{itemize} | |
726 | \item \texttt{(type value)}, where ``type'' is the pointer's type and | |
727 | ``value'' is the pointer's (numeric) value. | |
728 | \item \texttt{(type ptr)}, where the pointer's type is ``type'' and | |
729 | the pointer's value is $x$, where $x$ is the memory location where | |
730 | ``ptr'' (another tagged pointer) lies. | |
731 | \item \texttt{(type ptr1 ptr2)}, where the pointer's type is ``type'' | |
732 | and the pointer's value is $x$, where $x$ is the memory location | |
733 | where ``ptr1'' lies and $x+1$ is the memory location where ``ptr2'' | |
734 | lies. | |
735 | \end{itemize} | |
736 | ||
737 | These three representations map nicely to the tagged pointers | |
738 | described in the previous section. The first representation is used | |
739 | for constant symbols and variable references, the second for | |
740 | procedures, and the third for constant lists, conditionals, and | |
741 | procedure calls. | |
742 | ||
743 | A visual representation of the intermediate representation (from the | |
744 | AIM-514 paper) can be seen on the next page. The expression in | |
745 | question is | |
746 | ||
747 | \begin{lstlisting} | |
748 | ((lambda append (x y) (if (atom x) y (cons (car x) (append (cdr x) y)))) '(a b c) '(d e f)) | |
749 | \end{lstlisting} | |
750 | ||
751 | which we would represent as this sexp: | |
752 | ||
753 | \begin{lstlisting} | |
754 | (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3))) (VAR -2)) (CALL (ATOM 0) (VAR -3))))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) | |
755 | \end{lstlisting} | |
756 | ||
757 | As mentioned in the Background chapter, the processor has a bug (the | |
758 | ATOM bug) that causes \texttt{(ATOM NIL)} to be true. As a result, | |
759 | \texttt{(atom x)} in this expression is always true, even when | |
760 | \texttt{x} is the empty list, and the expression loops forever (or at | |
761 | least until memory is exhausted). We can fix this by changing the | |
762 | condition from \texttt{(atom x)} to \texttt{x}, which is false when | |
763 | \texttt{x} is NIL and true otherwise, and swapping the two branches of | |
764 | the conditional. The fixed program is: | |
765 | ||
766 | \begin{lstlisting} | |
767 | ((lambda append (x y) (if x (cons (car x) (append (cdr x) y)) y)) '(a b c) '(d e f)) | |
768 | \end{lstlisting} | |
769 | ||
770 | which is compiled to the sligthly shorter sexp: | |
771 | ||
772 | \begin{lstlisting} | |
773 | (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (VAR -2) (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3)))) (VAR -3)))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) | |
774 | \end{lstlisting} | |
775 | ||
776 | ||
777 | \begin{figure} | |
778 | \centering\includegraphics[width=18.5cm]{AIM-514-ir.pdf} | |
779 | \caption{Example of intermediate representation} | |
780 | \end{figure} | |
781 | ||
782 | \section{Software} | |
783 | \subsection{Compiler} | |
784 | The compiler takes in Lisp code and translates it to a sexp-based | |
785 | format understood by the assembler. | |
786 | ||
787 | The compiler has several functions for compiling certain kinds of | |
788 | expressions, and a function (\texttt{process\_toplevel}) that uses the | |
789 | former to compile arbitrary expressions. | |
790 | ||
791 | The output from any of this functions is a sexp in the intermediate | |
792 | form described in the previous section. | |
793 | ||
794 | Symbols are identified by sequences of letters on the source code side | |
795 | and by numbers on the processor side. The compiler therefore keeps a | |
796 | mapping between symbol names and their numbers. Whenever a new symbol | |
797 | is encountered, a new entry is added to this map. | |
798 | ||
799 | The \texttt{process\_quoted} function handles constants. For example | |
800 | if the source code contains \texttt{'(1 2 3)} then | |
801 | \texttt{process\_quoted} is called with \texttt{(1 2 3)}. This | |
802 | function handles the symbol interning described in the previous | |
803 | paragraph, and converts its input to intermediate form. | |
804 | ||
805 | The \texttt{process\_proc} function handles lambda abstractions. For | |
806 | example if the source code contains \texttt{(lambda first (a b) a)} | |
807 | then \texttt{process\_proc} is called with \texttt{first}, | |
808 | \texttt{(a b)}, and \texttt{a}. This function appends the function | |
809 | name and arguments to the environment, then processes the body (using | |
810 | \texttt{process\_toplevel}) with the new environment. | |
811 | ||
812 | The \texttt{process\_funcall} function handles function calls. For | |
813 | example if the source code contains \texttt{(reverse-list 2 1)} then | |
814 | \texttt{process\_funcall} is called with \texttt{reverse-list}, and | |
815 | \texttt{(2 1)}. The arguments are first processed (using | |
816 | \texttt{process\_toplevel}), then (if the function is not a primitive) | |
817 | the function is processed as well. | |
818 | ||
819 | Finally the \texttt{process\_toplevel} handles any expression by | |
820 | figuring out what sort of expression it is and then processing it | |
821 | appropriately. Empty lists and \texttt{NIL} become \texttt{(list 0)}, | |
822 | numbers and \texttt{T} are processed with \texttt{process\_quoted} | |
823 | ||
824 | Examples of compiler operation can be found in the compiler tests. | |
825 | Here are a few of them: | |
826 | ||
827 | \begin{itemize} | |
828 | \item The Lisp expression \texttt{'()} is compiled to \texttt{(LIST 0)} | |
829 | \item The Lisp expression \texttt{'(5 foo)} is compiled to | |
830 | \texttt{(LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4))}. Observe that | |
831 | the CDR comes before the CAR, and symbols are interned: | |
832 | \texttt{'foo} is compiled to \texttt{(SYMBOL 3)} and \texttt{'5} is | |
833 | compiled to \texttt{(SYMBOL 4)} | |
834 | \item The Lisp expression \texttt{(if t '(2 3) 'x)} is compiled to | |
835 | \texttt{(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0) (SYMBOL 3)) | |
836 | (SYMBOL 4))) (SYMBOL 2))}. Observe that \texttt{t} is always | |
837 | compiled to \texttt{(SYMBOL 2)}, as mentioned in the section about | |
838 | the memory format. | |
839 | \end{itemize} | |
840 | ||
841 | \subsection{Assembler} | |
842 | The assembler takes in an S-expression in the intermediate form, lays | |
843 | out the words in memory, and produces a memory dump that can be | |
844 | supplied to the processor. | |
845 | ||
846 | The core of the assembler is a function called \texttt{process} that | |
847 | takes an expression in intermediate form and an optional location in | |
848 | memory. This function lays out the expression's subexpressions in | |
849 | memory and then lays out the expression itself at the given location | |
850 | (or at the end of the memory if there is no location given). | |
851 | ||
852 | The assembler begins by initializing the first 7 elements of the | |
853 | memory with the values needed by the processor. After initialization, | |
854 | the \texttt{process} function is called on the input expression with | |
855 | no location. | |
856 | ||
857 | Here is how the \texttt{process} function lays out the expression | |
858 | $expr$ at location $l$ (where $l$ can be a given memory location or | |
859 | simply ``the first free location in memory''): | |
860 | ||
861 | \begin{itemize} | |
862 | \item If $expr$ is \texttt{(type value)}, then it is directly laid out | |
863 | at $l$ | |
864 | \item If $expr$ is \texttt{(type ptr)}, then \texttt{process} is | |
865 | called recursively to lay \texttt{ptr} out at the first free | |
866 | location in memory (say this is location $loc$). Then \texttt{(type | |
867 | $loc$)} is laid out at $l$. | |
868 | \item If $expr$ is \texttt{(type ptr1 ptr2)}, then two consecutive | |
869 | free locations in memory $loc$ and $loc+1$ are reserved, | |
870 | \texttt{process} is recursively called twice to lay \texttt{ptr1} at | |
871 | $loc$ and \texttt{ptr2} at $loc+1$, and finally | |
872 | \texttt{(type $loc$)} is laid out at $l$. | |
873 | \end{itemize} | |
874 | ||
875 | Once again, examples of assembler operations can be found in the | |
876 | assembler tests. Two of them: | |
877 | ||
878 | \begin{itemize} | |
879 | \item \texttt{(quoted (symbol 5))} is assembled to \texttt{00 08 00 00 | |
880 | 00 00 20 00 20 00 00 08 E0 07 00 00 20 05}. Observe that each word | |
881 | is made of two bytes, the first word (\texttt{00 08}) indicating the | |
882 | length (in words) of the program. The first 4 words are always | |
883 | fixed: they represent in order the CDR of NIL, the CAR of NIL, the | |
884 | CDR of T, and the CAR of T. The 7th word is also a constant, 0, | |
885 | because that is where the processor puts the result. The 5th word is | |
886 | the first free word in memory (which is equal to the length in words | |
887 | of the program), and the 6th word (\texttt{E0 07}) is the expression | |
888 | to be evaluated. In this case, it has type quote and points to | |
889 | memory location 7 (the 8th word, because pointers are 0-indexed). At | |
890 | memory location 7 we have the value \texttt{20 05} which has type | |
891 | symbol and value 5. | |
892 | \item \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} is | |
893 | assembled to \texttt{00 0C 00 00 00 00 20 00 20 00 00 0C C0 07 00 00 | |
894 | 00 09 20 05 E0 00 80 0B 5F FE}. Here we want to evaulate the | |
895 | expression \texttt{C0 07} (type ``function call'', address 7). This | |
896 | points to a cons-cell, whose car (at position 8) is \texttt{20 05} | |
897 | (that is, \texttt{(number 5)}) and whose cdr (at position 7) is | |
898 | \texttt{00 09} (type ``more'' and address 9). The ``more'' points to | |
899 | another cons-cell, whose cdr (at position 9) is \texttt{E0 00} (that | |
900 | is, \texttt{(funcall 0)}), and whose car is \texttt{80 0B}: a | |
901 | procedure whose body is at position 11, namely \texttt{5F FE} (that | |
902 | is, \texttt{(var -2)}). | |
903 | ||
904 | So the expression is a call to the primitive \texttt{funcall} with | |
905 | arguments \texttt{(lambda l (x) x)} and \texttt{5}. The original | |
906 | program was probably \texttt{((lambda l (x) x) 5)}. Note that this | |
907 | program was compiled with an old version of the compiler that did | |
908 | not intern symbols, which explains why \texttt{5} was compiled to | |
909 | \texttt{(number 5)} and not \texttt{(symbol 3)}. | |
910 | \end{itemize} | |
911 | ||
912 | \subsection{REPL} | |
913 | The REPL ties everything together: it reads a program from the user, | |
914 | compiles it, assembles it, sends the assembled program to the | |
915 | processor, reads the memory dump from the processor, and finally | |
916 | prints the result to the screen. | |
917 | ||
918 | The REPL includes an S-expression pretty-printer that operates on | |
919 | memory dumps. As symbols are interned by the compiler, when the | |
920 | pretty-printer hits a symbol it only knows its index, and not the | |
921 | original name. To print the original name the pretty-printer searches | |
922 | the symbol map built by the compiler and pulls the name from there. | |
923 | ||
924 | The pretty-printer takes a chunk of memory and index, and prints the | |
925 | expression at that index. For example, if the memory chunk is | |
926 | \texttt{00 00 00 00 20 00 20 00 00 09 00 07 00 07 20 02 00 00 60 00} | |
927 | and the index is 6, then the pretty printer will try to print | |
928 | \texttt{00 07}, which is a list with car \texttt{00 00} and cdr | |
929 | \texttt{20 02}, and the pretty printer prints \texttt{(nil . t)}. | |
930 | ||
931 | \clearpage | |
932 | An example REPL session is: | |
933 | ||
934 | \begin{lstlisting} | |
935 | YULE REPL | |
936 | ||
937 | * (cons 1 2) | |
938 | (1 . 2) | |
939 | * (car '(4 5 6)) | |
940 | 4 | |
941 | * (cdr '(t t t)) | |
942 | (T T) | |
943 | * ((lambda id (x) x) 5) | |
944 | 5 | |
945 | * (lambda first (a b) a) | |
946 | <CLOSURE> | |
947 | * ((lambda first (a b) a) 7 8) | |
948 | 7 | |
949 | * (atom 6) | |
950 | T | |
951 | \end{lstlisting} | |
952 | ||
953 | \chapter{Evaluation and Testing} | |
954 | To make development easier, ensure everything works correctly, and | |
955 | prevent regressions there are unit tests for the compiler and | |
956 | assembler, and functional tests for the entire project. | |
957 | ||
958 | Besides these automated tests, manual evaluation (in the beginning, | |
959 | precompiling simple expressions and supplying them to the processor as | |
960 | the initial value of the memory; later on, manually typing expressions | |
961 | in the REPL and looking at the output). When the processor was not | |
962 | operating as expected, the GTKWave wave simulator was used to assist | |
963 | with debugging. This piece of software takes in a recorded simulation | |
964 | of the circuits and displays the values of all signals inside the | |
965 | circuit against time. This allows the designer to look at every clock | |
966 | cycle in turn and discover where the error occurs. | |
967 | ||
968 | The process of debugging the processor had the side effect of finding | |
969 | inefficiencies in the original design. While \cite{lambda} mentions | |
970 | some limitations of the processors, one important limitation that was | |
971 | only discovered during debugging is that for every operation that the | |
972 | EVAL performs, the result of that operation will be written to new | |
973 | locations in memory. This includes ``read-only'' operations such as | |
974 | CAR and CDR. If the programmer requests the CAR of a list, a copy of | |
975 | the first element will be taken and a pointer to it will be returned. | |
976 | Since YULE only has 4096 words of memory, this severely restricts the | |
977 | complexity of programs that can be evaluated on it. | |
978 | ||
979 | Another avenue to ensure the processor works as it should is formal | |
980 | verification using model checking tools. Formal verification is | |
981 | commonly employed as part of circuit design evaluation because | |
982 | mistakes are more difficult to identify, and if found after the chip | |
983 | has been built they are very costly to fix (often the chips that were | |
984 | built with the bug need to be thrown away). However, building models | |
985 | is a very involved process, and devoting effort to formal verification | |
986 | of this project would not have allowed the implementation of several | |
987 | features. Due to time constraints and knowing that no physical YULE | |
988 | chips are built (the project is so far only synthesised to a FPGA), | |
989 | testing and manual evaluation was considered sufficient for YULE. | |
990 | ||
991 | \section{Unit testing} | |
992 | The software was tested with 48 unit tests written in Perl using the | |
993 | \texttt{Test::More} framework. | |
994 | ||
995 | The assembler takes in compiled Lisp code and outputs a memory dump in | |
996 | one of two formats (binary or Verilog code). Its tests take a | |
997 | configuration for the assembler, an input, two expected outputs, and | |
998 | (for documentation purposes) the source Lisp code. An example test: | |
999 | ||
1000 | \begin{lstlisting} | |
1001 | $expbin = | |
1002 | '00 0C 00 00 00 00 20 00 20 00 00 0C C0 07' | |
1003 | . ' 00 00 00 09 20 05 E0 00 80 0B 5F FE'; | |
1004 | run_test {addr_bits => 13}, | |
1005 | '(call (more (funcall 0) (proc (var -2))) (number 5))', | |
1006 | <<'EOF', $expbin, '((LAMBDA ID (X) X) 5)'; | |
1007 | mem[ 0] <= 0; // (cdr part of NIL) | |
1008 | mem[ 1] <= 0; // (car part of NIL) | |
1009 | mem[ 2] <= 16'b0010000000000000; // (cdr part of T) | |
1010 | mem[ 3] <= 16'b0010000000000000; // (car part of T) | |
1011 | mem[ 4] <= 16'd12; // (free storage pointer) | |
1012 | mem[ 5] <= 16'b1100000000000111; // CALL 7 | |
1013 | mem[ 6] <= 0; // (result of computation) | |
1014 | mem[ 7] <= 16'b0000000000001001; // MORE 9 | |
1015 | mem[ 8] <= 16'b0010000000000101; // NUMBER 5 | |
1016 | mem[ 9] <= 16'b1110000000000000; // FUNCALL 0 | |
1017 | mem[10] <= 16'b1000000000001011; // PROC 11 | |
1018 | mem[11] <= 16'b0101111111111110; // VAR -2 | |
1019 | EOF | |
1020 | \end{lstlisting} | |
1021 | ||
1022 | Here the compiled code \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} | |
1023 | is given to the assembler, and its outputs are compared to the binary | |
1024 | string on the first line and the block of Verilog code. | |
1025 | ||
1026 | The tests do not all contain valid assembler inputs. There are a | |
1027 | series of tests that give invalid inputs to the assembler and expect | |
1028 | the assembler to raise an exception matching a pattern. Example tests: | |
1029 | ||
1030 | \begin{lstlisting} | |
1031 | expect_error_like { run_test {}, '((type is a list) 5)'} | |
1032 | 'Type of toplevel is not atom'; | |
1033 | expect_error_like { run_test {}, '(badtype 5)'} | |
1034 | 'No such type'; | |
1035 | expect_error_like { run_test {}, '(5 700000)'} | |
1036 | 'Addr too large'; | |
1037 | \end{lstlisting} | |
1038 | ||
1039 | On the compiler side, there are 4 tests for the quoting function, | |
1040 | but the brunt of the tests are pairs of a compiler input (i.e. Lisp | |
1041 | code) and a compiler output (i.e. a sexp that the assembler | |
1042 | understands). For each pair the compiler is run on the first element | |
1043 | and its output is compared with the second. Example such tests: | |
1044 | ||
1045 | \begin{lstlisting} | |
1046 | is_toplevel '(quote 5)', '(SYMBOL 3)'; | |
1047 | is_toplevel '(reverse-list \'a \'a \'b)', | |
1048 | '(CALL (MORE (MORE (REVERSE-LIST 0) (SYMBOL 4)) (SYMBOL 3)) (SYMBOL 3))'; | |
1049 | is_toplevel '(if t \'(2 3) \'x)', | |
1050 | '(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0)' | |
1051 | . ' (SYMBOL 3)) (SYMBOL 4))) (SYMBOL 2))'; | |
1052 | \end{lstlisting} | |
1053 | ||
1054 | As the compiler code includes a function to pretty-print | |
1055 | S-expressions, there are a few round-trip tests on this function. | |
1056 | Given a string, it is parsed using the \texttt{Data::SExpression} | |
1057 | parsing library then it is pretty-printed. The result should match the | |
1058 | original string. | |
1059 | ||
1060 | There are also tests for handling of invalid input, like in the assembler. These include: | |
1061 | ||
1062 | \begin{lstlisting} | |
1063 | expect_error_like { is_toplevel 'x' } 'Variable x not in environment'; | |
1064 | expect_error_like { is_toplevel '(car)' } 'Cannot call primitive car with no arguments'; | |
1065 | \end{lstlisting} | |
1066 | ||
1067 | A test coverage tool, \texttt{Devel::Cover}, was ran on the compiler | |
1068 | and assembler to find statements and branches that were not covered by | |
1069 | existing unit tests, and new tests were written to cover them. Right | |
1070 | now \texttt{Devel::Cover} reports 100\% test coverage of the compiler | |
1071 | and assembler. | |
1072 | ||
1073 | \begin{figure} | |
1074 | \centering\includegraphics[height=8cm]{devel-cover.png} | |
1075 | \caption{Screenshot of \texttt{Devel::Cover} results, showing 100\% | |
1076 | test coverage} | |
1077 | \end{figure} | |
1078 | ||
1079 | \section{Functional testing} | |
1080 | Besides unit tests, the entire system (hardware and software) has | |
1081 | undergone functional testing. This means that a set of 21 Lisp | |
1082 | expressions has been prepared, alongside their correct values computed | |
1083 | using a software Lisp implementation. These values are given to the | |
1084 | YULE REPL and the test succeeds if the output equals the known correct | |
1085 | value of the expression. | |
1086 | ||
1087 | Here are some example tests: | |
1088 | \begin{lstlisting} | |
1089 | '(1 2 3) | |
1090 | 5 | |
1091 | ((lambda id (x) x) 5) | |
1092 | ((lambda id (x) x) '(1 2 3)) | |
1093 | (car '(2 3)) | |
1094 | (car (cdr '(4 5 6))) | |
1095 | ((lambda snd (x y) y) 'first 'second) | |
1096 | ||
1097 | (cons '(1 2) '(7)) | |
1098 | (car (cons '(1 2) '(7))) | |
1099 | (cdr (cons '(1 2) '(7))) | |
1100 | ||
1101 | ((lambda rev (xs) ((lambda reverse-acc (xs acc) (if xs (reverse-acc (cdr xs) (cons (car xs) acc)) acc)) xs '())) '(4 5 6 7)) | |
1102 | \end{lstlisting} | |
1103 | ||
1104 | This test suite tries to cover most kinds of valid expressions. It | |
1105 | includes constants of all kinds, conditionals, lambda abstractions, | |
1106 | calls to all primitives, calls to user-defined functions, recursive | |
1107 | calls. | |
1108 | ||
1109 | \section{Timing and performance} | |
1110 | There are three parts to the performance of YULE: | |
1111 | ||
1112 | \begin{enumerate} | |
1113 | \item The efficiency of the software. This refers to the time taken to | |
1114 | compile and assemble input programs, which is negligible. | |
1115 | ||
1116 | \item The I/O performance, which refers to the time needed to transfer | |
1117 | the program from the computer to the processor's memory, plus the | |
1118 | time needed to transfer the memory dump from the memory back to the | |
1119 | computer. | |
1120 | ||
1121 | \item The speed of the processor itself, that is the time/number of | |
1122 | cycles that the processor needs to evaluate a given expression. | |
1123 | \end{enumerate} | |
1124 | ||
1125 | The GC runs at 48MHz, while the EVAL only runs when the GC allows it | |
1126 | to which is at most half of the time. | |
1127 | ||
1128 | The UART runs at 4000000 baud, so a bit is sent every 12 cycles of the | |
1129 | 48MHz clock. A byte is made of 11 bits: a start bit, 8 data bits, and | |
1130 | one stop bit. A word is two bytes, so 22 bits. Therefore 264 cycles | |
1131 | are needed to transmit one word. | |
1132 | ||
1133 | Since the WRITER just sends a full memory dump, the I/O performance is | |
1134 | proportional not to \texttt{program\_len + result\_len} but to | |
1135 | \texttt{program\_len + result\_len + intermediate\_results\_len}. As | |
1136 | evaluation of any expression involves the creation of at least one | |
1137 | value (the result), for every expression or subexpression evaluated at | |
1138 | least 264 cycles will be needed to output that result. The EVAL takes | |
1139 | significantly less than 264 cycles to evaluate a expression (not | |
1140 | counting its subexpressions), in one test each expression took between | |
1141 | 30 and 90 cycles to evaluate. The processor performance is therefore | |
1142 | dominated by I/O. | |
1143 | ||
1144 | As an example, the program \texttt{((lambda id (x) x) 5)} is 12 bytes | |
1145 | long, and takes 258 cycles to evaluate. The memory dump is 54 bytes | |
1146 | long. Therefore of the roughly 17700 cycles needed from beginning to | |
1147 | end, about 98.5\% are used to do I/O. | |
1148 | ||
1149 | \section{A worked example} | |
1150 | \label{sec:example} | |
1151 | Here is an example of what happens when a expression is typed into the | |
1152 | REPL: | |
1153 | ||
1154 | \begin{enumerate} | |
1155 | \item A user types into the REPL the string | |
1156 | ||
1157 | \begin{lstlisting} | |
1158 | (car '(2 3 4)) | |
1159 | \end{lstlisting} | |
1160 | ||
1161 | This is a call to the CAR primitive, with the list \texttt{(2 3 4)} | |
1162 | as an argument. We expect the result to be \texttt{2}, the first | |
1163 | element of the list. | |
1164 | ||
1165 | \item This string is given to the compiler, producing the sexp | |
1166 | ||
1167 | \begin{lstlisting} | |
1168 | (CALL (CAR 0) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) | |
1169 | \end{lstlisting} | |
1170 | ||
1171 | Observe that in this representation the CDR comes before the CAR. So | |
1172 | for example ``(LIST (LIST 0) (SYMBOL 3))'' represents a list whose | |
1173 | CAR is ``(SYMBOL 3)'' and CDR is ``(LIST 0)'' (that is, nil). This | |
1174 | is a list with one element, ``(SYMBOL 3)''. The sexp shown here | |
1175 | represents a call to the CAR primitive with one argument. The list | |
1176 | has three elements, which are in order \texttt{(SYMBOL 5), (SYMBOL | |
1177 | 4), (SYMBOL 3)}. | |
1178 | ||
1179 | In this stage the compiler maps each number to a symbol, in this | |
1180 | case \texttt{(SYMBOL 5)} is 2, \texttt{(SYMBOL 4)} is 3, and | |
1181 | \texttt{(SYMBOL 3)} is 2. This mapping is remembered so it can be | |
1182 | used later by the pretty printer. | |
1183 | ||
1184 | \item This sexp is given to the assembler, producing the 16 words long | |
1185 | binary string (here displayed in hex, with a space after each 16-bit | |
1186 | word) | |
1187 | ||
1188 | \begin{lstlisting} | |
1189 | 000F 0000 0000 2000 2000 000F C007 0000 2000 0009 000B 2005 000D 2004 0000 2003 | |
1190 | \end{lstlisting} | |
1191 | ||
1192 | The first word, \texttt{000F}, indicates the length of the program | |
1193 | in | |
1194 | words (15 words, this does not include this first word).\\ | |
1195 | The next four words, \texttt{0000 0000 2000 2000}, represent the CDR | |
1196 | and CAR of NIL and T respectively. These words are always | |
1197 | constant.\\ | |
1198 | The next word, \texttt{000F}, represents the first unused position | |
1199 | in memory. This is always equal to the first word transmitted.\\ | |
1200 | The next word, \texttt{C007}, represents the expression to be | |
1201 | evaluated.\\ | |
1202 | The next word, \texttt{0000}, is ignored as this is where the | |
1203 | processor will place the result of the evaluation.\\ | |
1204 | The rest of the words are subexpressions or parts of subexpressions | |
1205 | of the expression being evaluated.\\ | |
1206 | ||
1207 | The expression that is evaluated is \texttt{C007}, which is a CALL | |
1208 | whose CDR is at position 7 (\texttt{2000}, meaning \texttt{(CAR 0)}) | |
1209 | and whose CAR is at | |
1210 | position 8 (\texttt{0009}).\\ | |
1211 | \texttt{0009} is a list with CAR at 10 (\texttt{2005}, meaning | |
1212 | \texttt{(SYMBOL 5)}) and CDR at 9 (\texttt{000B}).\\ | |
1213 | \texttt{000B} is a list with CAR at 12 (\texttt{2004}, meaning | |
1214 | \texttt{(SYMBOL 4)}) and CDR at 11 (\texttt{000D}).\\ | |
1215 | \texttt{000D} is a list with CAR at 14 (\texttt{2003}, meaning | |
1216 | \texttt{(SYMBOL 3)}) and CDR at 13 (\texttt{0000}, meaning NIL). | |
1217 | ||
1218 | \item This binary string is sent via UART to the FPGA's READER. The | |
1219 | READER writes the string (except the first word) into memory, | |
1220 | overwriting the previous contents of the memory. It then reports to | |
1221 | the controller that it is done, and the controller advances to | |
1222 | \texttt{STATE\_RUN}. | |
1223 | ||
1224 | \item The EVAL and GC evaluate the program in memory, and then place | |
1225 | the result back into memory. The EVAL reports to the controller that | |
1226 | it is done, and the controller advances to \texttt{STATE\_WRITE}. | |
1227 | ||
1228 | \item The WRITER sends this memory dump via UART back to the REPL | |
1229 | ||
1230 | \begin{lstlisting} | |
1231 | 000F C007 2005 2000 0009 000B 2005 000D 2004 0000 2003 0000 4004 2000 0010 0000 0012 0000 0000 0009 | |
1232 | \end{lstlisting} | |
1233 | ||
1234 | Then the WRITER reports to the controller that it is done, and the | |
1235 | controller advances to \texttt{STATE\_READ}. | |
1236 | ||
1237 | Observe that this memory dump is similar to the assembler output. | |
1238 | The first two words are the 6th and 7th words in the assembler | |
1239 | output, the next word is the result of the evaluation, then the next | |
1240 | 8 words are words 9th through 16th in the assembler output. The | |
1241 | following words are intermediate results computed by the processor. | |
1242 | ||
1243 | \item The REPL's pretty printer interpets this string starting with | |
1244 | \texttt{2005}, which means \texttt{(SYMBOL 5)}. The pretty printer | |
1245 | looks in the compiler's mapping and finds that the 5th symbol was | |
1246 | 3, and so the REPL prints | |
1247 | ||
1248 | \begin{lstlisting} | |
1249 | 3 | |
1250 | * | |
1251 | \end{lstlisting} | |
1252 | ||
1253 | The evaluation is complete, and a new prompt is printed so that the | |
1254 | user can input another expression. | |
1255 | \end{enumerate} | |
1256 | ||
1257 | \chapter{Conclusions} | |
1258 | A Lisp machine processor has been built using the original design and | |
1259 | techniques presented in \cite{lambda}. It has been extended with input | |
1260 | and output capabilities, and using a modern FPGA it has been | |
1261 | synthesised on a physical device slightly larger than a USB flash | |
1262 | drive. This device can be used as a standalone Lisp evaluator, or the | |
1263 | processor inside it can be included in a more complicated circuit. | |
1264 | ||
1265 | Software has been written to help interact with the processor. This | |
1266 | includes a compiler, an assembler, and a REPL that together make it | |
1267 | possible for an end-user unfamiliar with hardware design or the | |
1268 | processor's internals to submit Lisp expressions to the processor and | |
1269 | get back their values. The software also allows programmers to convert | |
1270 | Lisp source code to the binary format understood by the processor, so | |
1271 | that they may later send the binaries to the processor for evaluation. | |
1272 | ||
1273 | The processor and software have been tested to ensure they work | |
1274 | correctly. This has uncovered bugs and inefficiencies in the original | |
1275 | paper, some of which could not be gleaned from simply reading the | |
1276 | paper. | |
1277 | ||
1278 | The source files for the hardware and software, and also the automated | |
1279 | tests have been released under an open-source license to make it easy | |
1280 | for future circuit designers and programmers to use and extend this | |
1281 | project. They can be found at \url{https://git.ieval.ro/?p=yule.git}. | |
1282 | ||
1283 | The original paper \cite{lambda} is one of the influential Lambda | |
1284 | Papers that defined the Scheme programming language and is an | |
1285 | important historical artifact for understading how Lisp machine | |
1286 | processors can be implemented and more generally how tagged | |
1287 | architectures work. But all details of the processor cannot be | |
1288 | understood from simply reading the paper, and most readers would not | |
1289 | follow by attempting to implement what is described there. This | |
1290 | projects brings a tangible implementation into the open, which shows | |
1291 | the limitations of the design to people interested in this design and | |
1292 | makes it possible to solve this limitations and extend the processor | |
1293 | beyond its original scope. | |
1294 | ||
1295 | \section{Reflections} | |
1296 | Now that the project is finished, it is worth revisiting ideas I had | |
1297 | in the beginning that were not ultimately useful. | |
1298 | ||
1299 | The original concept had the processor run on a typical development | |
1300 | board, with the software running on a Raspberry PI. The board and the | |
1301 | Raspberry PI would be connected through jumper wires, and the UART | |
1302 | circuitry on the Raspberry PI would be used by the software to talk to | |
1303 | the processor. | |
1304 | ||
1305 | This idea would have produced a much harder to use YULE. Not only | |
1306 | would the user have to ssh into a Raspberry PI to do anything useful, | |
1307 | but also the equipment would take more space (both boards are larger | |
1308 | than the iCEstick). Once I tried the iCEstick and found that | |
1309 | everything can be done just as well with it, it replaced the two | |
1310 | boards and made the project easier to use (albeit slightly harder to | |
1311 | debug, as the iCEstick only has 5 LEDs). | |
1312 | ||
1313 | Some code had to be rewritten, sometimes multiple times. The compiler | |
1314 | was originally written in Common Lisp, as it is the best suited | |
1315 | language for manipulation of S-expressions, especially Lisp code. This | |
1316 | meant a working project was obtained more quickly, but then the | |
1317 | compiler had to be rewritten in another language; if the user has a | |
1318 | software Lisp implementation already, why would they need YULE? I | |
1319 | chose Perl as the reimplementation language, as the assembler was | |
1320 | already written in Perl and it is widely-available, most systems | |
1321 | coming with a Perl interpreter out of the box. | |
1322 | ||
1323 | Throughout development of the hardware part, timing problems were a | |
1324 | mainstay. The first ones related to the interactions between the | |
1325 | processor and the memory (the memory was clocked, but the processor | |
1326 | expected a combinational RAM). What fixed this problem was to run the | |
1327 | memory at twice the speed of the rest of the processor. At some point, | |
1328 | a decision was taken to run the GC and EVAL on different clocks (both | |
1329 | ran at one quarter the speed of the master clock, but one of the | |
1330 | clocks was delayed by half a cycle. The positive edge on one clock | |
1331 | would happen midway between a positive edge and a negative edge on the | |
1332 | other one). This proved to be unwise, as sometimes values sent by one | |
1333 | module to the other would be cleared before the destination module | |
1334 | managed to read them. | |
1335 | ||
1336 | GTKWave proved to be very useful in debugging these timing issues, as | |
1337 | the exact point where the issue began was easy to identify and I could | |
1338 | see why the issue was happening. A workaround for this was to add a | |
1339 | latch on each side of the common E bus that remembered the value sent | |
1340 | long enough for the receiving module to see it. | |
1341 | ||
1342 | All timing problems were fixed more cleanly in the end when an attempt | |
1343 | to simplify the clocks was made. First GC and EVAL were changed to use | |
1344 | the same clock, which removed the need for the latches, and then the | |
1345 | memory was changed to use the negative edge of the clock instead of | |
1346 | the positive edge, which meant no more clock division code had to be | |
1347 | used. | |
1348 | ||
1349 | \section{Future work} | |
1350 | There are several ways to extend this project. | |
1351 | ||
1352 | One idea is to fix the noted limitations: the ATOM primitive considers | |
1353 | NIL to not be an atom, and there is no EQ primitive that compares | |
1354 | symbols/other objects. The latter problem means that all symbols are | |
1355 | indistinguishable (note: NIL can still be distinguished from T and | |
1356 | other symbols, but NIL is not a symbol). To introduce equality a new | |
1357 | signal needs to be added to the EVAL that compares (for example) the | |
1358 | value on the E bus with the value of a fixed register, possibly a | |
1359 | newly added one. Then (EQ A B) would store A in the register, load B | |
1360 | on the E bus and compare them. | |
1361 | ||
1362 | Once equality is implemented, it becomes possible to implement the | |
1363 | EVAL function that evaluates an arbitrary Lisp expression. | |
1364 | ||
1365 | Two more involved instruction set improvements are adding arithmetic | |
1366 | operations, and input/output. Without arithmetic primitives one has to | |
1367 | rely on Church numerals, which encodes numbers as lambda abstractions | |
1368 | and is very inefficient. Input/output primitives would allow programs | |
1369 | to receive data at runtime over the UART and send data back before | |
1370 | evaluation is finished. This enables long-running useful programs, for | |
1371 | example with input/output and equality a REPL can be written (which | |
1372 | relies on the EVAl function mentioned above). | |
1373 | ||
1374 | On the software side, the assembler can support optimizations and then | |
1375 | the REPL can be extended with a core image. The main optimization | |
1376 | needed in the assembler is common subexpression elimination. The | |
1377 | processor never overwrites a value in memory with another value (other | |
1378 | than overwriting \texttt{mem[6]} with the result of the evaluation), | |
1379 | so it is safe to share any identical subexpressions. For example, if | |
1380 | the user inputs the program \texttt{(cons (car '(1 2 3 4)) (cdr '(0 2 | |
1381 | 3 4)))}, then when the assembler should share the tails of the two | |
1382 | lists, meaning the second list should compile to \texttt{(LIST (LIST | |
1383 | X) (SYMBOL Y))} where X is a pointer to where the second element of | |
1384 | the first list is stored. This makes long programs significantly | |
1385 | shorter. Note that no compiler changes are required to implement this, | |
1386 | as the assembler can just remember all expressions and subexpressions | |
1387 | it has assembler so far including their locations, and whenever a new | |
1388 | expression needs to be assembler it can be replaced with a pointer to | |
1389 | where the same expression has been placed before. | |
1390 | ||
1391 | Once this optimization is implemented, the REPL can gain core image | |
1392 | support. Suppose the user enters the expression \texttt{expr}. Instead | |
1393 | of sending \texttt{assemble(compile(expr))} to the processor, the REPL | |
1394 | can remember the last expression sent (or the last memory dump | |
1395 | received), then append \texttt{assemble(compile(expr))} to that | |
1396 | previous expression. This way \texttt{expr} can refer to previously | |
1397 | defined expressions, as long as the REPL or compiler has a way to name | |
1398 | previously supplied instructions, for example via a \texttt{define} | |
1399 | special form, via the standard REPL variables (\texttt{+, ++, +++}), | |
1400 | or by sequentially numbering all expressions submitted by the user. | |
1401 | ||
1402 | Returning to hardware, a potential future project is to add actual | |
1403 | hardware garbage collection to the GC component. At the moment the | |
1404 | design does no garbage collection, the processor only has 4096 words | |
1405 | of RAM available, and evaluation is quite memory-inefficient. Adding | |
1406 | garbage collection would make more complicated programs possible | |
1407 | despite the limited RAM. One way to implement this is to add a small | |
1408 | microprocessor to the design which is preprogrammed with a garbage | |
1409 | collection algorithm (simple choices are a stop-and-copy collector or | |
1410 | Cheney's algorithm). When the free space pointer gets too high, the | |
1411 | EVAL and GC are stopped, and the new microprocessor is started up so | |
1412 | it can perform garbage collection, compact the memory, and update the | |
1413 | free space pointer. The EVAL/GC registers would have to be updated as | |
1414 | well. Then the microprocessor can be stopped, EVAL and GC be started | |
1415 | up again and evaluation can continue. Note that a non-moving collector | |
1416 | is not sufficient: the GC allocates new words only at the very end of | |
1417 | the memory, so it cannot benefit from free ``gaps'' in the middle of | |
1418 | the memory. | |
1419 | ||
1420 | \begin{thebibliography}{99} | |
1421 | \bibitem{lambda} Design of LISP-Based Processors (1979), available at | |
1422 | \url{http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-514.pdf} | |
1423 | \bibitem{osdvu} \url{https://opencores.org/project/osdvu} | |
1424 | \bibitem{gtkwave} \url{http://gtkwave.sourceforge.net/} | |
1425 | \bibitem{icestorm} \url{http://www.clifford.at/icestorm/} | |
1426 | \end{thebibliography} | |
1427 | \end{document} |