]>
Commit | Line | Data |
---|---|---|
e7b86bf0 MG |
1 | \documentclass[a4paper,12pt]{report} |
2 | \usepackage{color} | |
3 | \usepackage[cm]{fullpage} | |
4 | \usepackage{graphicx} | |
5 | \usepackage{hyperref} | |
6 | \usepackage{listings} | |
7 | \usepackage{pdfpages} | |
8 | ||
9 | \title{YULE: Yet Unnamed Lisp Evaluator} | |
10 | \author{Marius Gavrilescu} | |
11 | \date{May 2018} | |
12 | ||
13 | \begin{document} | |
14 | % title page does not need anchors | |
15 | \hypersetup{pageanchor=false} | |
16 | \maketitle | |
17 | \hypersetup{pageanchor=true} | |
18 | \lstset{language=Perl, frame=leftline, framexleftmargin=5pt, | |
19 | showstringspaces=false, showspaces=false, breaklines=true, | |
20 | postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space}} | |
21 | ||
22 | \chapter*{Abstract} | |
23 | Lisp machines are computers designed to run Lisp programs efficiently. | |
24 | They were first developed in the 1970s, and they have special hardware | |
25 | support for Lisp. This usually comprises a tagged architecture (each | |
26 | pointer has a ``type'' field), garbage collection implemented in | |
27 | hardware, and an instruction set that lends itself well to evaluating | |
28 | Lisp expressions. | |
29 | ||
30 | Between 1975 and 1980, Guy L. Steele and Gerald Jay Sussman wrote a | |
31 | series of memos on lambda calculus, the Lisp language and its | |
32 | implementation known collectively as the ``Lambda Papers''. One of | |
33 | these papers, ``Design of LISP-Based Processors (1979)'' describes how | |
34 | to implement a Lisp machine processor. If this processor is connected | |
35 | to a RAM that contains a program, it executes that program and places | |
36 | the result back into the RAM. | |
37 | ||
38 | The aim of YULE is to take the design from the 1979, implement it | |
39 | today using modern technology and techniques, and bundle it up in a | |
40 | form that is both easy to use and extend. By going through this | |
41 | modernisation process, future developers can easily integrate the Lisp | |
42 | processor into their own design, modifying it as they please. | |
43 | ||
44 | The end result of YULE is a small device that can be plugged into a | |
45 | USB port, and an associated computer program. Running the program | |
46 | presents a simplified Lisp listener. The user can type a Lisp | |
47 | expression into the listener, and the listener will respond with the | |
48 | value of the expression. Under the hood, the listener evaluates Lisp | |
49 | by sending the expression over USB to the small device and getting | |
50 | back the result. | |
51 | ||
52 | YULE comes with the design files for the hardware, and source code for | |
53 | the software. All of these are released under a free software license | |
54 | and published online, which makes it very easy (from both a practical | |
55 | and a legal standpoint) for future developers to use in their own | |
56 | projects and modify to fit their purposes better. | |
57 | \tableofcontents | |
58 | ||
59 | \chapter{Introduction} | |
60 | \section{Motivation} | |
61 | Lisp machine processors are quite unlike modern processors. Modern | |
62 | instruction sets are designed for imperative programming. Each | |
63 | instruction makes the processor execute some operation, such as | |
64 | loading a register or storing a value in memory. In contrast, a Lisp | |
65 | machine processor reads an expression from memory, determines its | |
66 | type, and recursively evaluates it. This operation lends itself to | |
2dbe6ed8 | 67 | declarative programming, where the programmer declares an arbitrarily |
e7b86bf0 MG |
68 | complicated expression and asks the computer to evaluate it. |
69 | ||
70 | Another common feature of Lisp machine processors is the use of tagged | |
71 | pointers. Each memory location is a tagged union of two elements: the | |
72 | value itself, and a tag representing the type of that data. Tagged | |
73 | architectures are not common today, although some programs still | |
74 | use tagged pointers by exploiting pointer alignment (if every pointer | |
75 | is a multiple of 8, then the bottom 3 bits can be used to store a | |
76 | tag). These tags can be used to store an object's reference count (as | |
77 | in Objective-C on certain versions of iOS), or information about a | |
78 | data structure (as in certain implementations of kd-trees). | |
79 | ||
80 | The ``Design of LISP-Based Processors (1979)'' paper by Guy L. Steele | |
81 | and Gerald Jay Sussman \cite{lambda} presents the design of a | |
82 | relatively simple Lisp machine processor, which is made from two | |
83 | read-only memories, two buses, and several registers. If this | |
84 | processor is connected to a RAM containing a Lisp expression in a | |
85 | certain binary format, the processor will evaluate the expression and | |
86 | place the result back in memory. | |
87 | ||
88 | This processor features both a tagged architecture (the top 3 bits of | |
89 | each word in memory represent the value's type) and the declarative | |
90 | nature mentioned above. It is therefore interesting to study and to | |
91 | bring this processor into today's world by using technology such as | |
92 | hardware description languages (that allow us to describe a circuit | |
93 | formally) and FPGAs (that allow us to synthesise any circuit quickly | |
94 | and at low cost). By building a HDL implementation of the processor | |
95 | not only can it be used easily on its own (by synthesising it to a | |
96 | FPGA and connecting it to RAM), but a future circuit designer can take | |
97 | this processor and include it in their more complicated design. | |
98 | ||
99 | The processor presented in the original paper is a somewhat barebones | |
100 | implementation of Lisp. Arithmetic primitives are not available, nor | |
101 | are there any input or output routines. Due to time constraints the | |
102 | paper's writers have not included a equality primitive. Additionally | |
103 | the ATOM primitive has a bug that makes it consider NIL not atomic. | |
104 | All these deficiencies can be rectified given time and effort, but it | |
105 | is difficult for one to fix them given only a paper describing the | |
106 | processor. Having a HDL implementation of the processor with a proper | |
107 | test suite allows a future circuit designer to more easily make | |
108 | changes to the processor while using the test suite to prevent | |
109 | regressions and using a FPGA to quickly obtain a hardware version of | |
110 | the changed processor that can be manually tested and used in any | |
111 | desired application. | |
112 | ||
113 | \section{Challenges} | |
114 | While \cite{lambda} is part of the well-known Lambda Papers series and | |
115 | has been read by many over the years, there are no published | |
116 | implementations of the processor described in the paper. Therefore | |
117 | \cite{lambda} was the only source used in the building of this | |
118 | project, and several of the limitations of the processor were not easy | |
119 | to see before the implementation phase was well underway. Furthermore, | |
120 | the original paper has several shortcomings. | |
121 | ||
122 | For one, while the microcode (the values of the two read-only | |
123 | memories) is provided in both source and compiled form, the microcode | |
124 | compiler is not provided. This makes it difficult to alter the | |
125 | microcode to fix bugs or introduce new instructions. One needs to | |
126 | either work on the already-compiled code (hard and error-prone) or | |
127 | reimplement the microcode compiler, which takes a reasonable amount of | |
128 | effort. | |
129 | ||
130 | Moreover, the paper does not concern itself at all with how the user | |
131 | gets a program into memory or how the user gets the result back. This | |
132 | makes the processor as-described very difficult to use. | |
133 | ||
134 | To implement the processor in Verilog, no source other than the paper | |
135 | was used. There were no previous implementations of the processor that | |
136 | could be consulted to resolve unclarities in the original paper. | |
137 | ||
138 | In addition, there was no software available for interacting with the | |
139 | processor. A compiler and assembler had to be written from scratch, | |
140 | and so the processor could not be easily debugged until after the | |
141 | software was functional. | |
142 | ||
143 | \section{Contribution} | |
144 | YULE is a modern-day implementation of a Lisp machine processor using | |
145 | the ideas and techniques presented in \cite{lambda}. | |
146 | ||
147 | The main contribution of YULE is a Verilog implementation of the | |
148 | processor which allows future circuit designers to use it in their own | |
149 | design and modify it as they please. | |
150 | ||
151 | As the processor as described has no way to input a program or extract | |
152 | a result from memory, YULE includes more circuitry around the | |
153 | processor that lets it communicate with the external word. This | |
154 | consists of a serial UART and two modules, a reader and a writer. When | |
155 | the processor is idle, the reader waits for a program to be sent over | |
156 | the UART. Once a program has been received, it is placed into memory | |
157 | and the processor starts to evaluate it. Once evaluation is done, the | |
158 | writer transmits the new contents of the memory over the UART. | |
159 | ||
160 | Besides the hardware chip, YULE also includes three pieces of | |
161 | software: | |
162 | ||
163 | \begin{itemize} | |
164 | \item a compiler, which takes Lisp source code and produces an | |
165 | intermediate representation of the program. | |
166 | \item an assembler, which takes a program in the intermediate | |
167 | representation and produces the binary representation of the program | |
168 | that the processor can understand. | |
169 | \item a REPL, in which users can type expressions. When an expression | |
170 | is submitted, the REPL compiles it, assembles it, and sends it to | |
171 | the processor. The processor evaluates the expression and sends back | |
172 | a memory dump. The REPL then reads the memory dump, interprets it, | |
173 | and prints the value of the expression in a human-readable format. | |
174 | \end{itemize} | |
175 | ||
176 | The REPL is therefore the entry point of YULE as a whole. An end-user | |
67d929ce MG |
177 | who does not intend to extend YULE can plug in a USB device containing |
178 | the YULE processor, and run the REPL. Then the user can write | |
179 | expressions in the REPL and get their values back. Besides being the | |
180 | entry point, the REPL serves as a convenient way to test that the | |
181 | entire system is working. An expression typed in the REPL goes through | |
182 | the compiler and assembler, and then is sent to the processor. It goes | |
183 | through the three stages of the processor (reading the expression into | |
184 | memory, evaluating the contents of the memory, writing a memory dump). | |
e7b86bf0 MG |
185 | |
186 | The hardware and software components of YULE have been tested to | |
187 | ensure they function correctly. There is an automated test suite | |
188 | containing 48 unit tests for the compiler and assembler, and 21 | |
189 | functional tests that engage all parts of the system (both the | |
190 | software and the hardware). The unit tests ensure that the various | |
191 | small parts of the compiler and assembler work properly, while the | |
192 | functional tests consist of pairs of short Lisp expressions and their | |
193 | correct values. The expressions are written to the REPL, and the | |
194 | REPL's output is compared to the correct values. | |
195 | ||
196 | To make it easier for future circuit designers and programmers to use | |
197 | this project, YULE has been released under an open source license. | |
198 | Prospective users can find all the source files (for both hardware and | |
199 | software) online, and are free to use, modify, and redistribute them | |
200 | under the same terms as Perl itself. The files can be accessed at this | |
201 | URL: \url{https://git.ieval.ro/?p=yule.git}. | |
202 | ||
203 | \section{Layout of this report} | |
204 | This report begins by describing the background of YULE, the | |
205 | technologies that made this project possible and a more in-depth | |
206 | description of the 1979 paper. Then the components that YULE comprises | |
207 | are examined in turn, followed by a chapter on how everything was | |
208 | tested, which includes a worked example of the system in action. The | |
209 | report ends with a list of ideas for future work based on YULE, and a | |
210 | section on reflections. | |
211 | ||
212 | \chapter{Background} | |
213 | ||
214 | \section{Field-programmable gate array} | |
215 | An FPGA is a configurable integrated circuit. A designer can define a | |
216 | circuit using a hardware description language (explained in the next | |
217 | section) and compile it to some binary format called a configuration. | |
218 | This configuration describes how the different components of the FPGA | |
219 | should be connected to each other. The configuration can then be | |
220 | uploaded to the FPGA, and the FPGA will connect its internal | |
221 | components as requested. | |
222 | ||
223 | An FPGA can therefore replace any digital custom circuit (ASIC). While | |
224 | an FPGA consumes more energy, uses more space, and runs at slower | |
225 | clock rates than an ASIC, the FPGA permits rapid prototyping: the time | |
226 | between finishing to write a design and having a working part is mere | |
227 | seconds (compilation time + uploading the configuration to the FPGA), | |
228 | and if a bug is discovered the buggy part need not be thrown away. The | |
229 | design can be modified and the fixed configuration can be uploaded to | |
230 | the FPGA. | |
231 | ||
232 | The hardware aspect of YULE is designed to run on the Lattice | |
233 | iCE40HX-1k FPGA, but with small modifications it can run on other | |
234 | FPGAs as well. | |
235 | ||
236 | \subsection*{iCEstick} | |
237 | \begin{figure} | |
238 | \centering | |
239 | \includegraphics[height=4cm]{icestick.png} | |
240 | \caption{Lattice iCEstick Evaluation Kit} | |
241 | \end{figure} | |
242 | ||
243 | The Lattice iCEstick Evaluation Kit is a development board for the | |
244 | iCE40HX-1k FPGA. It contains the FPGA, a USB connector, a FTDI chip | |
245 | and 5 LEDs. The form factor is very convenient. It is similar to a USB | |
246 | memory stick (albeit bigger) that can be plugged into a computer, | |
247 | avoiding the need for jumper wires and GPIO that traditional | |
248 | development boards require. | |
249 | ||
250 | The FTDI chip implements the USB protocol, allowing the FPGA to be | |
251 | programmed by the computer and letting the FPGA talk to the computer | |
252 | using a very simple serial UART interface on the FPGA side, and a | |
253 | standard USB serial port on the computer side. This makes | |
254 | communication very easy. The FPGA can use any off-the-shelf UART | |
255 | implementation (this project uses \texttt{osdvu} from | |
256 | Opencores \cite{osdvu}), and the computer can use standard serial | |
257 | console tools such as \texttt{minicom}. | |
258 | ||
259 | \section{Verilog} | |
260 | Verilog is a hardware description language. It is used to model | |
261 | digital circuits at register-transfer level of abstraction. In Verilog | |
262 | one defines several modules, with one of them (the ``toplevel'' | |
263 | module) representing the entire design. Each module has several inputs | |
264 | and outputs, can contain internal wires, can contain instances of | |
265 | other modules, and can contain combinational and sequential logic | |
266 | blocks that drive internal wires and outputs. | |
267 | ||
268 | Verilog has many more complicated features that are not used in YULE. | |
269 | These include tristate buffers, delays, functions, tasks, loops, and | |
270 | strength declarations. | |
271 | ||
272 | A subset of Verilog can be \textit{synthesised}; that is converted to | |
273 | a netlist (a list of transistors or FPGA components used and | |
274 | connections between them) that can then be used to build an ASIC | |
275 | (custom integrated circuit), or can be further converted to the | |
276 | internal representation of the FPGA and uploaded to the FPGA. The FPGA | |
277 | workflow and the programs that are part of it are described in the | |
278 | next section. | |
279 | ||
280 | A verilog project can also be run through a simulator, which simulates | |
281 | the design clock edge by clock edge and records the values on each | |
282 | wire in the design. This record can later be read in and visualised | |
283 | with a wave viewer, which allows a circuit designer to better | |
284 | understand why the circuit is not working correctly. One such program, | |
285 | GTKWave \cite{gtkwave}, was used to debug YULE. | |
286 | ||
287 | \section{icestorm} | |
288 | Project IceStorm \cite{icestorm} is a set of open source reverse | |
289 | engineered tools for programming Lattice iCE40 FPGAs. There exists a | |
290 | fully open source Verilog-to-Bitstream flow which comprises: | |
291 | ||
292 | \begin{enumerate} | |
293 | \item Yosys, a Verilog synthetizer. | |
294 | \item Arachne-pnr, a tool that implements placing and routing for | |
295 | iCE40 FPGAs. | |
296 | \item icepack, which converts Arachne-pnr's output to iCE40 bitstream. | |
297 | \item iceprog, which programs the iCEstick with the output of icepack. | |
298 | \end{enumerate} | |
299 | ||
300 | \section{Design of LISP-Based Processors (1979)} | |
301 | Design of LISP-Based Processors (1979) \cite{lambda} is one of the | |
302 | ``Lambda papers''; a series of memos by Sussman and Steele about the | |
303 | Lisp language and its implementation. The paper describes SIMPLE, a | |
304 | microprocessor designed to run Lisp. | |
305 | ||
306 | This processor is made of two read-only memories called EVAL and GC. | |
307 | Each read-only memory is paired with state registers to form a state | |
308 | machine. The contents of the state registers are fed into the | |
309 | read-only memory, and the output of the read-only memory determines | |
310 | the next state and other signals. | |
311 | ||
312 | Each state machine also has an internal bus (called the E and G buses | |
313 | respectively) and several registers connected to that bus. Each of | |
314 | these registers stores a tagged pointer, which is an 11-bit value (the | |
315 | top 3 bits represent the type, and the bottom 8 bytes represent the | |
316 | value). | |
317 | ||
318 | The signals coming out of each state machine can cause a register to | |
319 | be read onto the machine's internal bus, a register to be loaded from | |
320 | the internal bus, or write a constant value onto the internal bus. | |
321 | ||
322 | The GC state machine represents the processor's memory system. It is | |
323 | connected to an external RAM, which it controls through several | |
324 | signals. It knows how to execute certain ``GC operations'', which | |
325 | represent common Lisp memory management actions. Examples of these are | |
326 | \texttt{CAR} and \texttt{CDR} (``given a pointer to a cons cell, | |
327 | return either the CAR or the CDR of that cons cell'') or \texttt{CONS} | |
328 | (``given two pointers, allocate a new cons cell, copy them into it, | |
329 | and return a pointer to the new cons cell''). The existence of the GC | |
330 | isolates the rest of the processor from the particularities of the | |
331 | memory used. | |
332 | ||
333 | The EVAL state machine evaluates Lisp expressions. Signals coming out | |
334 | of it control the GC by selecting what operation the GC should perform | |
335 | and connecting the E and G buses to supply arguments to the GC and | |
336 | read results back. The EVAL reads expressions from memory via the GC | |
337 | and evaluates them, putting results back into memory. | |
338 | ||
339 | The overall operation of the processor is to read an expression from | |
340 | the magic location ``5'' in memory, evaluate it, and place the result | |
341 | into the magic location ``6'' in memory. Other memory locations are | |
342 | used to store subexpressions of the input expression, and to store any | |
343 | intermediate results. Despite its name, GC does not perform any | |
344 | garbage collection, so at the end of the processor's operation the | |
345 | memory will still contain all intermediate results produced during | |
346 | evaluation. | |
347 | ||
348 | The processor is intended as a proof of concept and not a | |
349 | production-ready hardware implementation of Lisp. Therefore it has | |
350 | several important limitations, for example there is no primitive | |
351 | operation that implements equality of objects (making it impossible to | |
352 | distinguish between symbols), there are no primitive arithmetic | |
353 | operations (so one needs to use Church numerals if numbers are | |
354 | needed), and the processor includes a known bug (``the ATOM bug''). | |
355 | The bug is that the ATOM primitive considers NIL to be non-atomic, | |
356 | which is incorrect. | |
357 | ||
358 | The Lisp expressions mentioned here are in textual form (Lisp source | |
359 | code), but instead in a special tagged pointer format. Each memory | |
360 | value is a union of two values: a 8-bit pointer and a 3-bit type. We | |
361 | can use pairs of consecutive memory locations to represent cons cells | |
362 | and build linked lists out of them. If a list is at location X, that | |
363 | means that (X, X+1) is the first cons cell of the list, with X+1 being | |
364 | the CAR (the value of the first element) and X being the CDR (a | |
365 | pointer to the next cons cell). A discussion of the different ways to | |
366 | represent Lisp expressions (source code, this tagged pointer format, | |
367 | and an intermediate representation) can be found in \ref{sec:formats}. | |
368 | ||
369 | The next page shows a diagram of how the processor in the paper is | |
370 | structured. The two read-only memories can be seen, with their | |
371 | registers, state registers, and the interconnection between the E and | |
372 | G buses. | |
373 | ||
374 | \begin{figure} | |
375 | \centering\includegraphics[width=18.5cm]{AIM-514-design.pdf} | |
376 | \caption{Design diagram of processor from \cite{lambda}} | |
377 | \end{figure} | |
378 | ||
379 | \chapter{A Lisp Machine processor on a USB drive} | |
380 | The project comprises a hardware part written in Verilog and a | |
381 | software part written in Perl. | |
382 | ||
383 | The hardware part is a Lisp machine processor with associated | |
384 | circuitry that can read a program from the iCEstick UART and that can | |
385 | write the result of running a program to the UART. The normal | |
386 | operation of this part is to: | |
387 | ||
388 | \begin{enumerate} | |
389 | \item Wait for a program to be supplied via UART | |
390 | \item Run the program on the processor | |
391 | \item Write a memory dump to the UART | |
392 | \item Go back to step 1 | |
393 | \end{enumerate} | |
394 | ||
395 | The software part includes: | |
396 | ||
397 | \begin{itemize} | |
398 | \item A compiler, which takes Lisp source code and puts it in the | |
399 | linked list format understood by the SIMPLE processor | |
400 | \item An assembler, which takes the result of the compiler and | |
401 | converts it to the SIMPLE in-memory binary format. | |
402 | \item A REPL, which repeatedly reads a program from the user, compiles | |
403 | it, assembles it, sends it to YULE, reads the memory dump from YULE, | |
404 | and prints the result for the user to see. | |
405 | \end{itemize} | |
406 | ||
e7b86bf0 MG |
407 | \section{Hardware} |
408 | The hardware part is built from 8 modules: | |
409 | ||
410 | \begin{enumerate} | |
411 | \item The \textbf{READER}, which implements step 1 of the normal operation described above. | |
412 | \item The \textbf{WRITER}, which implements step 3 of the normal operation described above. | |
413 | \item The \textbf{GCRAM}, a single-port RAM. | |
414 | \item The \textbf{GC}, corresponding to the GC PLA in the original implementation. | |
415 | \item The \textbf{EVAL}, corresponding to the EVAL PLA in the original implementation. | |
416 | \item The \textbf{CTRL}, which advances the operation from one step to the next one. | |
417 | \item The \textbf{UART}, which implements transmission and reception of bytes via UART. | |
418 | \item The \textbf{CPU}, which instantiates every module above and wires them together. | |
419 | \end{enumerate} | |
420 | ||
421 | \subsection{Clocks} | |
422 | To avoid timing problems, a single 48 MHz clock is used. The iCEstick | |
423 | has a 12 MHz oscillator, and the phase-locked loop circuitry on the | |
424 | FPGA multiplies this signal by 4. | |
425 | ||
426 | Every flip-flop updates on the positive edge of the clock, except for | |
427 | the RAM which updates on the negative edge. This way the rest of the | |
428 | circuit perceives the RAM to be combinational (it ``updates | |
429 | immediately''): if on a positive edge the RAM's address input changes, | |
430 | the RAM will update its output on the following negative edge and the | |
431 | changed output will be visible on the next positive edge. If the RAM | |
432 | updated on a positive edge, then if the input changed on positive edge | |
433 | $i$ the changed output will only be visible on positive edge $i+2$. | |
434 | ||
435 | The design is intended to be run on a FPGA, which has a fixed clock | |
436 | tree. This typically consists of a small number of special nets called | |
437 | global buffers that can distribute signals with minimal skew to every | |
438 | logic cell. The FPGA on the iCEstick has 8 such global buffers. | |
439 | ||
440 | ||
441 | There are 4 components in the design (GC, EVAL, READER, WRITER) that | |
442 | need to be enabled/disabled at different times. One way to achieve | |
443 | this is to AND the master 48 MHz clock with several signals, thus | |
444 | creating a separate clock for each component. This is known as | |
445 | \textit{clock gating}. However, on a FPGA clock gating involves | |
446 | routing the clock from a global buffer to a logic cell and then back | |
447 | to another global buffer. This can introduce glitches in the clock and | |
448 | would delay each of the clocks by some amount (dependent on the | |
449 | signals it is ANDed with). It is therefore possible that the 4 output | |
450 | clocks would not be aligned, causing problems when data is | |
451 | communicated by one component to another. As such, clock gating is not | |
452 | recommended on FPGAs. | |
453 | ||
454 | \begin{figure} | |
455 | \centering\includegraphics[height=3cm]{gating.eps} | |
456 | \caption{Gated clock on the left, clock enable on the right} | |
457 | \end{figure} | |
458 | ||
459 | To avoid this problem, the design used here does not use clock gating | |
460 | and instead uses 4 clock enable signals supplied by the controller to | |
461 | each of the GC, EVAL, READER and WRITER. At any point the clock is | |
462 | enabled for exactly one of the READER, GC, and WRITER. The EVAL clock | |
463 | is enabled if and only if the GC clock is enabled and the GC's | |
464 | \texttt{step\_eval} output is high. | |
465 | ||
e7b86bf0 MG |
466 | \subsection{GC} |
467 | \begin{figure} | |
468 | \centering\includegraphics[height=4.88cm]{gc.eps} | |
469 | \caption{GC module} | |
470 | \end{figure} | |
471 | ||
472 | The GC module is connected to the GCRAM (via the \texttt{ram\_do, | |
473 | ram\_we, ram\_addr, ram\_di} signals), to the EVAL (via the | |
474 | \texttt{Ein, Eout, gcop, step\_eval, conn\_et, conn\_ea} signals) and | |
475 | to the controller. It also provides the value of its P register (the | |
476 | free storage pointer) to the WRITER, so the writer can know when to | |
477 | stop dumping the memory. | |
478 | ||
479 | It is implemented as a combinational ROM, some combinational logic to | |
480 | drive the internal G bus, sequential logic to advance to the next | |
481 | state respecting dispatch rules, and sequential logic to update | |
482 | registers. The clock enable signal acts as an active low synchronous | |
483 | reset: when the clock is disabled, the state register is set to the | |
484 | initial state. | |
485 | ||
486 | This module represents one half of the original design presented in | |
487 | \cite{lambda}, and the ROM's contents are those from the paper with a | |
2dbe6ed8 | 488 | bug (the original design sets the ADR line low in two states when it |
e7b86bf0 MG |
489 | should be high). |
490 | ||
491 | \subsection{EVAL} | |
492 | \begin{figure} | |
493 | \centering\includegraphics[height=3.29cm]{eval.eps} | |
494 | \caption{EVAL module} | |
495 | \end{figure} | |
496 | ||
497 | The EVAL module is connected to the GC (via the \texttt{conn\_et, | |
498 | conn\_ea, Ein, Eout, gcop} signals) and to the controller. | |
499 | ||
500 | The implementation of the EVAL is very similar to that of the GC. | |
501 | There is a combinational ROM, combinational logic to drive the E bus, | |
502 | sequential logic to advance to the next state respecting dispatch | |
503 | rules, and sequential logic to update registers. A synchronous reset | |
504 | is available. When the reset line is held high for one clock cycle, | |
505 | the state register is set to the initial state. It is necessary to | |
506 | separate the reset from the clock enable because the GC often pauses | |
507 | the EVAL to do work and the EVAL should not be reset every time this | |
508 | happens. | |
509 | ||
510 | This module represents the other half of the original design presented | |
511 | in \cite{lambda}, and the ROM's contents are the same as those in the | |
512 | paper except that the ROM is widened by one bit to indicate whether | |
513 | the EVAL is finished evaluating the expression. | |
514 | ||
e7b86bf0 MG |
515 | \subsection{Writer} |
516 | \begin{figure} | |
517 | \centering\includegraphics[height=2.26cm]{writer.eps} | |
518 | \caption{WRITER module} | |
519 | \end{figure} | |
520 | ||
521 | The WRITER module is connected to the UART (via the \texttt{tx\_busy, | |
522 | tx\_signal, tx\_byte} signals), the GC (via the \texttt{P} signal), | |
523 | the RAM (via the \texttt{ram\_addr, ram\_do} signals), and to the | |
524 | controller. | |
525 | ||
526 | Its purpose is to dump the memory to the UART starting with position 4 | |
527 | (since positions 0, 1, 2, 3 are constant) and finishing with position | |
528 | $P - 1$, the last used cell in memory. It is implemented as a state | |
529 | machine with 7 states and an extra register (\texttt{current\_index}) | |
530 | that stores the current index in the memory. | |
531 | ||
532 | The memory is 16-bit wide, but the UART sends and receives bytes. The | |
533 | writer dumps memory word in network byte order (big-endian), meaning | |
534 | the significant byte is sent before the least significant byte. | |
535 | ||
536 | The states are, in order: | |
537 | \begin{description} | |
538 | \item[START] which initializes \texttt{current\_index} to 4 | |
539 | \item[WRITE1\_WAIT] which waits for \texttt{tx\_busy} to become low | |
540 | \item[WRITE1] in which transmission of the upper byte starts | |
541 | \item[WRITE2\_WAIT] which waits for \texttt{tx\_busy} to become low | |
542 | \item[WRITE2] in which transmission of the lower byte starts | |
543 | \item[INCREMENT] in which \texttt{current\_index} is incremented, | |
544 | jumping back to WRITE1\_WAIT if it is still lower than | |
545 | \texttt{freeptr}. | |
546 | \item[FINISHED] in which \texttt{finished} is driven high | |
547 | \end{description} | |
548 | ||
549 | This module is not part of the original design in \cite{lambda}, as | |
550 | that paper did not concern itself with input and output. | |
551 | ||
e7b86bf0 MG |
552 | \subsection{Reader} |
553 | \begin{figure} | |
554 | \centering\includegraphics[height=2.22cm]{reader.eps} | |
555 | \caption{READER module} | |
556 | \end{figure} | |
557 | ||
558 | The READER module is connected to the UART (via the | |
559 | \texttt{rx\_signal, rx\_byte} signals), the RAM (via the | |
560 | \texttt{ram\_we, ram\_addr, ram\_di} signals), and to the controller. | |
561 | ||
562 | Its purpose is to read a program from the UART and overwrite the GCRAM | |
563 | with this program. It is implemented as a state machine with two extra | |
564 | registers (\texttt{total\_words} and \texttt{current\_index}) to store | |
565 | the size of the incoming program and the current position within it. | |
566 | ||
567 | The reader expects to receive a 13-bit \texttt{length} value, followed | |
568 | by \texttt{length} 16-bit values that represent the desired memory | |
569 | contents. Each of these values is sent in network byte order | |
570 | (big-endian), meaning the most significant byte is sent before the | |
571 | least significant byte. | |
572 | ||
573 | The states are, in order: | |
574 | \begin{description} | |
575 | \item[IDLE] which waits for \texttt{rx\_signal}, puts the read byte at the top of \texttt{total\_words}. | |
576 | \item[LENGTH] which waits for \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{total\_words} | |
577 | \item[READ1] which waits for the \texttt{rx\_signal}, puts the read byte at the top of \texttt{ram\_di} and increments \texttt{current\_index} | |
578 | \item[READ2] which waits for the \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{ram\_di} | |
579 | \item[WRITE] in which \texttt{ram\_we} is high, jumping back to READ1 if \texttt{current\_index + 1 < total\_words} | |
580 | \item[FINISHED] in which \texttt{finished} is high | |
581 | \end{description} | |
582 | ||
583 | This module is not part of the original design in \cite{lambda}, as | |
584 | that paper did not concern itself with input and output. | |
585 | ||
e7b86bf0 MG |
586 | \subsection{Putting the hardware together} |
587 | The components described above are managed by the controller, which | |
588 | provides them with clock enable signals. The controller is a state | |
589 | machine with three states: | |
590 | \begin{description} | |
591 | \item[STATE\_READ] in which only the READER clock is enabled | |
592 | \item[STATE\_RUN] in which the GC and EVAL clocks are enabled | |
593 | \item[STATE\_WRITE] in which only the WRITER clock is enabled | |
594 | \end{description} | |
595 | ||
596 | In STATE\_READ and STATE\_WRITE, the EVAL's reset line is high. | |
597 | ||
2dbe6ed8 | 598 | Whenever the component corresponding to the current state indicates it |
e7b86bf0 MG |
599 | is finished, the controller advances to the next state. |
600 | ||
601 | The controller also includes three multiplexers that control which | |
602 | lines are connected to the GCRAM. If the READER is active, all three | |
603 | GCRAM lines are connected to the READER. If the WRITER is active, the | |
604 | address line is connected to the writer (and the other lines are | |
605 | connected to the GC). If the GC is active, all lines are connected to | |
606 | the GC. | |
607 | ||
608 | The controller also includes logic to drive the 5 LEDs on the | |
609 | iCEstick. Three LEDs indicate the current state, while the other two | |
610 | indicate if the UART is currently transmitting or receiving. | |
611 | ||
612 | \section{Representing a Lisp program} | |
613 | \label{sec:formats} | |
614 | The processor described above evaluates a Lisp program received via | |
615 | UART. This section talks about three representations of such a | |
616 | program: the Lisp source code (a S-expression), the intermediate form | |
617 | (also a S-expression), and the in-memory form (a binary string). | |
618 | ||
619 | The first representation is obtained by parsing the user input, the | |
620 | compiler converts this into the second, and the assembler converts the | |
621 | second into the third. | |
622 | ||
623 | \subsection{Lisp source code} | |
624 | Recall that a S-expression is either an atom or a cons cell. | |
625 | ||
626 | An atom is a sequence of letters, digits, and certain symbols. There | |
627 | are two special atoms named \texttt{T} and \texttt{NIL} representing | |
628 | the boolean true and false respectively. | |
629 | ||
630 | A cons cell is a pair of two S-expressions called the ``car'' and the | |
631 | ``cdr''. We write them as \texttt{(car . cdr)}, for example \texttt{(5 | |
632 | . 7)} is a cons cell with car \texttt{5} and cdr \texttt{7}. | |
633 | ||
634 | We can use cons cells to represent linked lists. A linked list is a | |
635 | sequence of cons cells where the cdr of one cell is the next cell, and | |
636 | the cdr of the last cell is \texttt{NIL}. For example the list | |
637 | \texttt{1,2,3} can be written as \texttt{(1 . (2 . (3 . NIL)))}. | |
638 | We can also write this list more concisely as \texttt{(1 2 3)}, and we | |
639 | write \texttt{()} to mean the atom \texttt{NIL}. | |
640 | ||
641 | Using these syntax elements we can encode several types of logical | |
642 | expressions: | |
643 | ||
644 | \begin{itemize} | |
645 | \item Numbers are encoded as numeric atoms, e.g. \texttt{5} | |
646 | \item Variables are encoded as non-numeric atoms, e.g. \texttt{my-var} | |
647 | \item Conditional statements are encoded as 4-element lists, e.g. | |
648 | \texttt{(if is-odd 5 6)} which has value 5 if the variable | |
649 | \texttt{is-odd} is not \texttt{NIL} and value 6 otherwise | |
650 | \item Symbols are encoded as non-numeric atoms preceded by a single | |
651 | quote, e.g. \texttt{'my-symbol}. This can also be written | |
652 | \texttt{(quote my-symbol)} | |
653 | \item Constant lists are encoded as lists preceded by a single quote, | |
654 | e.g. \texttt{'(1 2 (3 4 5))}. This can also be written | |
655 | \texttt{(quote (1 2 (3 4 5)))} | |
656 | \item Functions (lambda abstractions) are encoded as 4-element lists, | |
657 | e.g. \texttt{(lambda first (a b) a)} in which the second element is | |
658 | the function name, the third element is a list of function | |
659 | arguments, and the fourth element is the function body. | |
660 | \item Function calls are encoded as lists, with the function as the | |
661 | first element and the arguments as the other elements, e.g. | |
662 | \texttt{(atom 5)} calls the function \texttt{atom} with argument | |
663 | \texttt{5}. | |
664 | \end{itemize} | |
665 | ||
666 | \subsection{In-memory representation} | |
667 | The memory is an array of 16-bit integer values. We interpret these as | |
668 | tagged pointers: the top 3 bits indicate the ``type'' of this pointer, | |
669 | while the other 13 bits represent another memory location. A pointer | |
670 | therefore has a type and a value. | |
671 | ||
672 | We represent each Lisp expression as a tagged pointer. To evaluate it, | |
673 | the processor looks at the type of the pointer. Here is what a pointer | |
674 | of value $v$ means based on its type. We use the syntax $mem[a]$ to | |
675 | mean ``the contents of the memory location $a$'' (that is, a tagged | |
676 | pointer) and $memval[a]$ to mean ``the value at the memory location | |
677 | $a$'' (that is, the value of the tagged pointer $mem[a]$). | |
678 | ||
679 | \begin{enumerate} | |
680 | \setcounter{enumi}{-1} | |
681 | \item is a constant list, whose cdr is $mem[v]$ and car is $mem[v+1]$. | |
682 | \item is a constant symbol. | |
683 | \item is a variable reference, where $v$ indicates the position in the | |
684 | environment of the referenced variable. | |
685 | \item is a constant closure, that is the result of evaluating a lambda | |
686 | abstraction. Values of this type do not normally appear in the input | |
687 | code, but instead are created during evaluation of other forms. | |
688 | \item is a procedure (lambda abstraction), whose body is $mem[v]$. | |
689 | \item is a conditional, where $mem[v+1]$ is the condition, | |
690 | $mem[memval[v]+1]$ is the ``then'' branch, and $mem[memval[v]]$ is the | |
691 | ``else'' branch. | |
692 | \item is a procedure call, which points to a special kind of linked | |
693 | list. The CDR of each cell (other than the last one) points to the | |
694 | next cell and has type 0, while the CDR of the last cell has value 0 | |
695 | and its type indicates the primitive to be called. The CARs of these | |
696 | cells indicate the arguments for the procedure, in reverse order. | |
697 | ||
698 | To call a user-defined procedure, we use the primitive | |
699 | \texttt{funcall} and put the user-defined procedure as the CAR of | |
700 | the last cell (in other words, the user-defined procedure is the | |
701 | first argument to \texttt{funcall}). | |
702 | \item is a quoted constant, representing the object at $mem[v]$. | |
703 | \end{enumerate} | |
704 | ||
705 | The booleans \texttt{NIL} and \texttt{T} are compiled to a list | |
706 | pointing to 0, and a symbol with value 2 respectively. | |
707 | ||
708 | The memory has a peculiar fixed layout. Locations 0 and 1 are the CDR | |
709 | and CAR of NIL, locations 2 and 3 are the CDR and CAR of T, location 4 | |
710 | holds the first unused location in memory (this tells the processor | |
711 | where it can start to place intermediate expressions), location 5 | |
712 | holds the expression to be evaluated, and location 6 is where the | |
713 | processor will place the value of the expression. | |
714 | ||
715 | \subsection{The intermediate form} | |
716 | To facilitate conversion from the former format to the latter, an | |
717 | intermediate form is used. Here we represent tagged pointers as lists | |
718 | of one of three types: | |
719 | ||
720 | \begin{itemize} | |
721 | \item \texttt{(type value)}, where ``type'' is the pointer's type and | |
722 | ``value'' is the pointer's (numeric) value. | |
723 | \item \texttt{(type ptr)}, where the pointer's type is ``type'' and | |
724 | the pointer's value is $x$, where $x$ is the memory location where | |
725 | ``ptr'' (another tagged pointer) lies. | |
726 | \item \texttt{(type ptr1 ptr2)}, where the pointer's type is ``type'' | |
727 | and the pointer's value is $x$, where $x$ is the memory location | |
728 | where ``ptr1'' lies and $x+1$ is the memory location where ``ptr2'' | |
729 | lies. | |
730 | \end{itemize} | |
731 | ||
732 | These three representations map nicely to the tagged pointers | |
733 | described in the previous section. The first representation is used | |
734 | for constant symbols and variable references, the second for | |
735 | procedures, and the third for constant lists, conditionals, and | |
736 | procedure calls. | |
737 | ||
738 | A visual representation of the intermediate representation (from the | |
739 | AIM-514 paper) can be seen on the next page. The expression in | |
740 | question is | |
741 | ||
742 | \begin{lstlisting} | |
743 | ((lambda append (x y) (if (atom x) y (cons (car x) (append (cdr x) y)))) '(a b c) '(d e f)) | |
744 | \end{lstlisting} | |
745 | ||
746 | which we would represent as this sexp: | |
747 | ||
748 | \begin{lstlisting} | |
749 | (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3))) (VAR -2)) (CALL (ATOM 0) (VAR -3))))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) | |
750 | \end{lstlisting} | |
751 | ||
752 | As mentioned in the Background chapter, the processor has a bug (the | |
753 | ATOM bug) that causes \texttt{(ATOM NIL)} to be true. As a result, | |
754 | \texttt{(atom x)} in this expression is always true, even when | |
755 | \texttt{x} is the empty list, and the expression loops forever (or at | |
756 | least until memory is exhausted). We can fix this by changing the | |
757 | condition from \texttt{(atom x)} to \texttt{x}, which is false when | |
758 | \texttt{x} is NIL and true otherwise, and swapping the two branches of | |
759 | the conditional. The fixed program is: | |
760 | ||
761 | \begin{lstlisting} | |
762 | ((lambda append (x y) (if x (cons (car x) (append (cdr x) y)) y)) '(a b c) '(d e f)) | |
763 | \end{lstlisting} | |
764 | ||
2dbe6ed8 | 765 | which is compiled to the slightly shorter sexp: |
e7b86bf0 MG |
766 | |
767 | \begin{lstlisting} | |
768 | (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (VAR -2) (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3)))) (VAR -3)))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) | |
769 | \end{lstlisting} | |
770 | ||
771 | ||
772 | \begin{figure} | |
773 | \centering\includegraphics[width=18.5cm]{AIM-514-ir.pdf} | |
774 | \caption{Example of intermediate representation} | |
775 | \end{figure} | |
776 | ||
777 | \section{Software} | |
778 | \subsection{Compiler} | |
779 | The compiler takes in Lisp code and translates it to a sexp-based | |
780 | format understood by the assembler. | |
781 | ||
782 | The compiler has several functions for compiling certain kinds of | |
783 | expressions, and a function (\texttt{process\_toplevel}) that uses the | |
784 | former to compile arbitrary expressions. | |
785 | ||
786 | The output from any of this functions is a sexp in the intermediate | |
787 | form described in the previous section. | |
788 | ||
789 | Symbols are identified by sequences of letters on the source code side | |
790 | and by numbers on the processor side. The compiler therefore keeps a | |
791 | mapping between symbol names and their numbers. Whenever a new symbol | |
792 | is encountered, a new entry is added to this map. | |
793 | ||
794 | The \texttt{process\_quoted} function handles constants. For example | |
795 | if the source code contains \texttt{'(1 2 3)} then | |
796 | \texttt{process\_quoted} is called with \texttt{(1 2 3)}. This | |
797 | function handles the symbol interning described in the previous | |
798 | paragraph, and converts its input to intermediate form. | |
799 | ||
800 | The \texttt{process\_proc} function handles lambda abstractions. For | |
801 | example if the source code contains \texttt{(lambda first (a b) a)} | |
802 | then \texttt{process\_proc} is called with \texttt{first}, | |
803 | \texttt{(a b)}, and \texttt{a}. This function appends the function | |
804 | name and arguments to the environment, then processes the body (using | |
805 | \texttt{process\_toplevel}) with the new environment. | |
806 | ||
807 | The \texttt{process\_funcall} function handles function calls. For | |
808 | example if the source code contains \texttt{(reverse-list 2 1)} then | |
809 | \texttt{process\_funcall} is called with \texttt{reverse-list}, and | |
810 | \texttt{(2 1)}. The arguments are first processed (using | |
811 | \texttt{process\_toplevel}), then (if the function is not a primitive) | |
812 | the function is processed as well. | |
813 | ||
814 | Finally the \texttt{process\_toplevel} handles any expression by | |
815 | figuring out what sort of expression it is and then processing it | |
816 | appropriately. Empty lists and \texttt{NIL} become \texttt{(list 0)}, | |
817 | numbers and \texttt{T} are processed with \texttt{process\_quoted} | |
818 | ||
819 | Examples of compiler operation can be found in the compiler tests. | |
820 | Here are a few of them: | |
821 | ||
822 | \begin{itemize} | |
823 | \item The Lisp expression \texttt{'()} is compiled to \texttt{(LIST 0)} | |
824 | \item The Lisp expression \texttt{'(5 foo)} is compiled to | |
825 | \texttt{(LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4))}. Observe that | |
826 | the CDR comes before the CAR, and symbols are interned: | |
827 | \texttt{'foo} is compiled to \texttt{(SYMBOL 3)} and \texttt{'5} is | |
828 | compiled to \texttt{(SYMBOL 4)} | |
829 | \item The Lisp expression \texttt{(if t '(2 3) 'x)} is compiled to | |
830 | \texttt{(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0) (SYMBOL 3)) | |
831 | (SYMBOL 4))) (SYMBOL 2))}. Observe that \texttt{t} is always | |
832 | compiled to \texttt{(SYMBOL 2)}, as mentioned in the section about | |
833 | the memory format. | |
834 | \end{itemize} | |
835 | ||
836 | \subsection{Assembler} | |
837 | The assembler takes in an S-expression in the intermediate form, lays | |
838 | out the words in memory, and produces a memory dump that can be | |
839 | supplied to the processor. | |
840 | ||
841 | The core of the assembler is a function called \texttt{process} that | |
842 | takes an expression in intermediate form and an optional location in | |
843 | memory. This function lays out the expression's subexpressions in | |
844 | memory and then lays out the expression itself at the given location | |
845 | (or at the end of the memory if there is no location given). | |
846 | ||
847 | The assembler begins by initializing the first 7 elements of the | |
848 | memory with the values needed by the processor. After initialization, | |
849 | the \texttt{process} function is called on the input expression with | |
850 | no location. | |
851 | ||
852 | Here is how the \texttt{process} function lays out the expression | |
853 | $expr$ at location $l$ (where $l$ can be a given memory location or | |
854 | simply ``the first free location in memory''): | |
855 | ||
856 | \begin{itemize} | |
857 | \item If $expr$ is \texttt{(type value)}, then it is directly laid out | |
858 | at $l$ | |
859 | \item If $expr$ is \texttt{(type ptr)}, then \texttt{process} is | |
860 | called recursively to lay \texttt{ptr} out at the first free | |
861 | location in memory (say this is location $loc$). Then \texttt{(type | |
862 | $loc$)} is laid out at $l$. | |
863 | \item If $expr$ is \texttt{(type ptr1 ptr2)}, then two consecutive | |
864 | free locations in memory $loc$ and $loc+1$ are reserved, | |
865 | \texttt{process} is recursively called twice to lay \texttt{ptr1} at | |
866 | $loc$ and \texttt{ptr2} at $loc+1$, and finally | |
867 | \texttt{(type $loc$)} is laid out at $l$. | |
868 | \end{itemize} | |
869 | ||
870 | Once again, examples of assembler operations can be found in the | |
871 | assembler tests. Two of them: | |
872 | ||
873 | \begin{itemize} | |
874 | \item \texttt{(quoted (symbol 5))} is assembled to \texttt{00 08 00 00 | |
875 | 00 00 20 00 20 00 00 08 E0 07 00 00 20 05}. Observe that each word | |
876 | is made of two bytes, the first word (\texttt{00 08}) indicating the | |
877 | length (in words) of the program. The first 4 words are always | |
878 | fixed: they represent in order the CDR of NIL, the CAR of NIL, the | |
879 | CDR of T, and the CAR of T. The 7th word is also a constant, 0, | |
880 | because that is where the processor puts the result. The 5th word is | |
881 | the first free word in memory (which is equal to the length in words | |
882 | of the program), and the 6th word (\texttt{E0 07}) is the expression | |
883 | to be evaluated. In this case, it has type quote and points to | |
884 | memory location 7 (the 8th word, because pointers are 0-indexed). At | |
885 | memory location 7 we have the value \texttt{20 05} which has type | |
886 | symbol and value 5. | |
887 | \item \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} is | |
888 | assembled to \texttt{00 0C 00 00 00 00 20 00 20 00 00 0C C0 07 00 00 | |
889 | 00 09 20 05 E0 00 80 0B 5F FE}. Here we want to evaulate the | |
890 | expression \texttt{C0 07} (type ``function call'', address 7). This | |
891 | points to a cons-cell, whose car (at position 8) is \texttt{20 05} | |
892 | (that is, \texttt{(number 5)}) and whose cdr (at position 7) is | |
893 | \texttt{00 09} (type ``more'' and address 9). The ``more'' points to | |
894 | another cons-cell, whose cdr (at position 9) is \texttt{E0 00} (that | |
895 | is, \texttt{(funcall 0)}), and whose car is \texttt{80 0B}: a | |
896 | procedure whose body is at position 11, namely \texttt{5F FE} (that | |
897 | is, \texttt{(var -2)}). | |
898 | ||
899 | So the expression is a call to the primitive \texttt{funcall} with | |
900 | arguments \texttt{(lambda l (x) x)} and \texttt{5}. The original | |
901 | program was probably \texttt{((lambda l (x) x) 5)}. Note that this | |
902 | program was compiled with an old version of the compiler that did | |
903 | not intern symbols, which explains why \texttt{5} was compiled to | |
904 | \texttt{(number 5)} and not \texttt{(symbol 3)}. | |
905 | \end{itemize} | |
906 | ||
907 | \subsection{REPL} | |
908 | The REPL ties everything together: it reads a program from the user, | |
909 | compiles it, assembles it, sends the assembled program to the | |
910 | processor, reads the memory dump from the processor, and finally | |
911 | prints the result to the screen. | |
912 | ||
913 | The REPL includes an S-expression pretty-printer that operates on | |
914 | memory dumps. As symbols are interned by the compiler, when the | |
915 | pretty-printer hits a symbol it only knows its index, and not the | |
916 | original name. To print the original name the pretty-printer searches | |
917 | the symbol map built by the compiler and pulls the name from there. | |
918 | ||
919 | The pretty-printer takes a chunk of memory and index, and prints the | |
920 | expression at that index. For example, if the memory chunk is | |
921 | \texttt{00 00 00 00 20 00 20 00 00 09 00 07 00 07 20 02 00 00 60 00} | |
922 | and the index is 6, then the pretty printer will try to print | |
923 | \texttt{00 07}, which is a list with car \texttt{00 00} and cdr | |
924 | \texttt{20 02}, and the pretty printer prints \texttt{(nil . t)}. | |
925 | ||
e7b86bf0 MG |
926 | An example REPL session is: |
927 | ||
928 | \begin{lstlisting} | |
929 | YULE REPL | |
930 | ||
931 | * (cons 1 2) | |
932 | (1 . 2) | |
933 | * (car '(4 5 6)) | |
934 | 4 | |
935 | * (cdr '(t t t)) | |
936 | (T T) | |
937 | * ((lambda id (x) x) 5) | |
938 | 5 | |
939 | * (lambda first (a b) a) | |
940 | <CLOSURE> | |
941 | * ((lambda first (a b) a) 7 8) | |
942 | 7 | |
943 | * (atom 6) | |
944 | T | |
945 | \end{lstlisting} | |
946 | ||
947 | \chapter{Evaluation and Testing} | |
948 | To make development easier, ensure everything works correctly, and | |
949 | prevent regressions there are unit tests for the compiler and | |
950 | assembler, and functional tests for the entire project. | |
951 | ||
952 | Besides these automated tests, manual evaluation (in the beginning, | |
953 | precompiling simple expressions and supplying them to the processor as | |
954 | the initial value of the memory; later on, manually typing expressions | |
955 | in the REPL and looking at the output). When the processor was not | |
956 | operating as expected, the GTKWave wave simulator was used to assist | |
957 | with debugging. This piece of software takes in a recorded simulation | |
958 | of the circuits and displays the values of all signals inside the | |
959 | circuit against time. This allows the designer to look at every clock | |
960 | cycle in turn and discover where the error occurs. | |
961 | ||
962 | The process of debugging the processor had the side effect of finding | |
963 | inefficiencies in the original design. While \cite{lambda} mentions | |
964 | some limitations of the processors, one important limitation that was | |
965 | only discovered during debugging is that for every operation that the | |
966 | EVAL performs, the result of that operation will be written to new | |
967 | locations in memory. This includes ``read-only'' operations such as | |
968 | CAR and CDR. If the programmer requests the CAR of a list, a copy of | |
969 | the first element will be taken and a pointer to it will be returned. | |
970 | Since YULE only has 4096 words of memory, this severely restricts the | |
971 | complexity of programs that can be evaluated on it. | |
972 | ||
973 | Another avenue to ensure the processor works as it should is formal | |
974 | verification using model checking tools. Formal verification is | |
975 | commonly employed as part of circuit design evaluation because | |
976 | mistakes are more difficult to identify, and if found after the chip | |
977 | has been built they are very costly to fix (often the chips that were | |
978 | built with the bug need to be thrown away). However, building models | |
979 | is a very involved process, and devoting effort to formal verification | |
980 | of this project would not have allowed the implementation of several | |
981 | features. Due to time constraints and knowing that no physical YULE | |
982 | chips are built (the project is so far only synthesised to a FPGA), | |
983 | testing and manual evaluation was considered sufficient for YULE. | |
984 | ||
985 | \section{Unit testing} | |
986 | The software was tested with 48 unit tests written in Perl using the | |
987 | \texttt{Test::More} framework. | |
988 | ||
989 | The assembler takes in compiled Lisp code and outputs a memory dump in | |
990 | one of two formats (binary or Verilog code). Its tests take a | |
991 | configuration for the assembler, an input, two expected outputs, and | |
992 | (for documentation purposes) the source Lisp code. An example test: | |
993 | ||
994 | \begin{lstlisting} | |
995 | $expbin = | |
996 | '00 0C 00 00 00 00 20 00 20 00 00 0C C0 07' | |
997 | . ' 00 00 00 09 20 05 E0 00 80 0B 5F FE'; | |
998 | run_test {addr_bits => 13}, | |
999 | '(call (more (funcall 0) (proc (var -2))) (number 5))', | |
1000 | <<'EOF', $expbin, '((LAMBDA ID (X) X) 5)'; | |
1001 | mem[ 0] <= 0; // (cdr part of NIL) | |
1002 | mem[ 1] <= 0; // (car part of NIL) | |
1003 | mem[ 2] <= 16'b0010000000000000; // (cdr part of T) | |
1004 | mem[ 3] <= 16'b0010000000000000; // (car part of T) | |
1005 | mem[ 4] <= 16'd12; // (free storage pointer) | |
1006 | mem[ 5] <= 16'b1100000000000111; // CALL 7 | |
1007 | mem[ 6] <= 0; // (result of computation) | |
1008 | mem[ 7] <= 16'b0000000000001001; // MORE 9 | |
1009 | mem[ 8] <= 16'b0010000000000101; // NUMBER 5 | |
1010 | mem[ 9] <= 16'b1110000000000000; // FUNCALL 0 | |
1011 | mem[10] <= 16'b1000000000001011; // PROC 11 | |
1012 | mem[11] <= 16'b0101111111111110; // VAR -2 | |
1013 | EOF | |
1014 | \end{lstlisting} | |
1015 | ||
1016 | Here the compiled code \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} | |
1017 | is given to the assembler, and its outputs are compared to the binary | |
1018 | string on the first line and the block of Verilog code. | |
1019 | ||
1020 | The tests do not all contain valid assembler inputs. There are a | |
1021 | series of tests that give invalid inputs to the assembler and expect | |
1022 | the assembler to raise an exception matching a pattern. Example tests: | |
1023 | ||
1024 | \begin{lstlisting} | |
1025 | expect_error_like { run_test {}, '((type is a list) 5)'} | |
1026 | 'Type of toplevel is not atom'; | |
1027 | expect_error_like { run_test {}, '(badtype 5)'} | |
1028 | 'No such type'; | |
1029 | expect_error_like { run_test {}, '(5 700000)'} | |
1030 | 'Addr too large'; | |
1031 | \end{lstlisting} | |
1032 | ||
1033 | On the compiler side, there are 4 tests for the quoting function, | |
1034 | but the brunt of the tests are pairs of a compiler input (i.e. Lisp | |
1035 | code) and a compiler output (i.e. a sexp that the assembler | |
1036 | understands). For each pair the compiler is run on the first element | |
1037 | and its output is compared with the second. Example such tests: | |
1038 | ||
1039 | \begin{lstlisting} | |
1040 | is_toplevel '(quote 5)', '(SYMBOL 3)'; | |
1041 | is_toplevel '(reverse-list \'a \'a \'b)', | |
1042 | '(CALL (MORE (MORE (REVERSE-LIST 0) (SYMBOL 4)) (SYMBOL 3)) (SYMBOL 3))'; | |
1043 | is_toplevel '(if t \'(2 3) \'x)', | |
1044 | '(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0)' | |
1045 | . ' (SYMBOL 3)) (SYMBOL 4))) (SYMBOL 2))'; | |
1046 | \end{lstlisting} | |
1047 | ||
1048 | As the compiler code includes a function to pretty-print | |
1049 | S-expressions, there are a few round-trip tests on this function. | |
1050 | Given a string, it is parsed using the \texttt{Data::SExpression} | |
1051 | parsing library then it is pretty-printed. The result should match the | |
1052 | original string. | |
1053 | ||
1054 | There are also tests for handling of invalid input, like in the assembler. These include: | |
1055 | ||
1056 | \begin{lstlisting} | |
1057 | expect_error_like { is_toplevel 'x' } 'Variable x not in environment'; | |
1058 | expect_error_like { is_toplevel '(car)' } 'Cannot call primitive car with no arguments'; | |
1059 | \end{lstlisting} | |
1060 | ||
1061 | A test coverage tool, \texttt{Devel::Cover}, was ran on the compiler | |
1062 | and assembler to find statements and branches that were not covered by | |
1063 | existing unit tests, and new tests were written to cover them. Right | |
1064 | now \texttt{Devel::Cover} reports 100\% test coverage of the compiler | |
1065 | and assembler. | |
1066 | ||
1067 | \begin{figure} | |
1068 | \centering\includegraphics[height=8cm]{devel-cover.png} | |
1069 | \caption{Screenshot of \texttt{Devel::Cover} results, showing 100\% | |
1070 | test coverage} | |
1071 | \end{figure} | |
1072 | ||
1073 | \section{Functional testing} | |
1074 | Besides unit tests, the entire system (hardware and software) has | |
1075 | undergone functional testing. This means that a set of 21 Lisp | |
1076 | expressions has been prepared, alongside their correct values computed | |
1077 | using a software Lisp implementation. These values are given to the | |
1078 | YULE REPL and the test succeeds if the output equals the known correct | |
1079 | value of the expression. | |
1080 | ||
1081 | Here are some example tests: | |
1082 | \begin{lstlisting} | |
1083 | '(1 2 3) | |
1084 | 5 | |
1085 | ((lambda id (x) x) 5) | |
1086 | ((lambda id (x) x) '(1 2 3)) | |
1087 | (car '(2 3)) | |
1088 | (car (cdr '(4 5 6))) | |
1089 | ((lambda snd (x y) y) 'first 'second) | |
1090 | ||
1091 | (cons '(1 2) '(7)) | |
1092 | (car (cons '(1 2) '(7))) | |
1093 | (cdr (cons '(1 2) '(7))) | |
1094 | ||
1095 | ((lambda rev (xs) ((lambda reverse-acc (xs acc) (if xs (reverse-acc (cdr xs) (cons (car xs) acc)) acc)) xs '())) '(4 5 6 7)) | |
1096 | \end{lstlisting} | |
1097 | ||
1098 | This test suite tries to cover most kinds of valid expressions. It | |
1099 | includes constants of all kinds, conditionals, lambda abstractions, | |
1100 | calls to all primitives, calls to user-defined functions, recursive | |
1101 | calls. | |
1102 | ||
1103 | \section{Timing and performance} | |
1104 | There are three parts to the performance of YULE: | |
1105 | ||
1106 | \begin{enumerate} | |
1107 | \item The efficiency of the software. This refers to the time taken to | |
1108 | compile and assemble input programs, which is negligible. | |
1109 | ||
1110 | \item The I/O performance, which refers to the time needed to transfer | |
1111 | the program from the computer to the processor's memory, plus the | |
1112 | time needed to transfer the memory dump from the memory back to the | |
1113 | computer. | |
1114 | ||
1115 | \item The speed of the processor itself, that is the time/number of | |
1116 | cycles that the processor needs to evaluate a given expression. | |
1117 | \end{enumerate} | |
1118 | ||
1119 | The GC runs at 48MHz, while the EVAL only runs when the GC allows it | |
1120 | to which is at most half of the time. | |
1121 | ||
1122 | The UART runs at 4000000 baud, so a bit is sent every 12 cycles of the | |
1123 | 48MHz clock. A byte is made of 11 bits: a start bit, 8 data bits, and | |
1124 | one stop bit. A word is two bytes, so 22 bits. Therefore 264 cycles | |
1125 | are needed to transmit one word. | |
1126 | ||
1127 | Since the WRITER just sends a full memory dump, the I/O performance is | |
1128 | proportional not to \texttt{program\_len + result\_len} but to | |
1129 | \texttt{program\_len + result\_len + intermediate\_results\_len}. As | |
1130 | evaluation of any expression involves the creation of at least one | |
1131 | value (the result), for every expression or subexpression evaluated at | |
1132 | least 264 cycles will be needed to output that result. The EVAL takes | |
1133 | significantly less than 264 cycles to evaluate a expression (not | |
1134 | counting its subexpressions), in one test each expression took between | |
1135 | 30 and 90 cycles to evaluate. The processor performance is therefore | |
1136 | dominated by I/O. | |
1137 | ||
1138 | As an example, the program \texttt{((lambda id (x) x) 5)} is 12 bytes | |
1139 | long, and takes 258 cycles to evaluate. The memory dump is 54 bytes | |
1140 | long. Therefore of the roughly 17700 cycles needed from beginning to | |
1141 | end, about 98.5\% are used to do I/O. | |
1142 | ||
1143 | \section{A worked example} | |
1144 | \label{sec:example} | |
1145 | Here is an example of what happens when a expression is typed into the | |
1146 | REPL: | |
1147 | ||
1148 | \begin{enumerate} | |
1149 | \item A user types into the REPL the string | |
1150 | ||
1151 | \begin{lstlisting} | |
1152 | (car '(2 3 4)) | |
1153 | \end{lstlisting} | |
1154 | ||
1155 | This is a call to the CAR primitive, with the list \texttt{(2 3 4)} | |
1156 | as an argument. We expect the result to be \texttt{2}, the first | |
1157 | element of the list. | |
1158 | ||
1159 | \item This string is given to the compiler, producing the sexp | |
1160 | ||
1161 | \begin{lstlisting} | |
1162 | (CALL (CAR 0) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) | |
1163 | \end{lstlisting} | |
1164 | ||
1165 | Observe that in this representation the CDR comes before the CAR. So | |
1166 | for example ``(LIST (LIST 0) (SYMBOL 3))'' represents a list whose | |
1167 | CAR is ``(SYMBOL 3)'' and CDR is ``(LIST 0)'' (that is, nil). This | |
1168 | is a list with one element, ``(SYMBOL 3)''. The sexp shown here | |
1169 | represents a call to the CAR primitive with one argument. The list | |
1170 | has three elements, which are in order \texttt{(SYMBOL 5), (SYMBOL | |
1171 | 4), (SYMBOL 3)}. | |
1172 | ||
1173 | In this stage the compiler maps each number to a symbol, in this | |
1174 | case \texttt{(SYMBOL 5)} is 2, \texttt{(SYMBOL 4)} is 3, and | |
1175 | \texttt{(SYMBOL 3)} is 2. This mapping is remembered so it can be | |
1176 | used later by the pretty printer. | |
1177 | ||
1178 | \item This sexp is given to the assembler, producing the 16 words long | |
1179 | binary string (here displayed in hex, with a space after each 16-bit | |
1180 | word) | |
1181 | ||
1182 | \begin{lstlisting} | |
1183 | 000F 0000 0000 2000 2000 000F C007 0000 2000 0009 000B 2005 000D 2004 0000 2003 | |
1184 | \end{lstlisting} | |
1185 | ||
1186 | The first word, \texttt{000F}, indicates the length of the program | |
1187 | in | |
1188 | words (15 words, this does not include this first word).\\ | |
1189 | The next four words, \texttt{0000 0000 2000 2000}, represent the CDR | |
1190 | and CAR of NIL and T respectively. These words are always | |
1191 | constant.\\ | |
1192 | The next word, \texttt{000F}, represents the first unused position | |
1193 | in memory. This is always equal to the first word transmitted.\\ | |
1194 | The next word, \texttt{C007}, represents the expression to be | |
1195 | evaluated.\\ | |
1196 | The next word, \texttt{0000}, is ignored as this is where the | |
1197 | processor will place the result of the evaluation.\\ | |
1198 | The rest of the words are subexpressions or parts of subexpressions | |
1199 | of the expression being evaluated.\\ | |
1200 | ||
1201 | The expression that is evaluated is \texttt{C007}, which is a CALL | |
1202 | whose CDR is at position 7 (\texttt{2000}, meaning \texttt{(CAR 0)}) | |
1203 | and whose CAR is at | |
1204 | position 8 (\texttt{0009}).\\ | |
1205 | \texttt{0009} is a list with CAR at 10 (\texttt{2005}, meaning | |
1206 | \texttt{(SYMBOL 5)}) and CDR at 9 (\texttt{000B}).\\ | |
1207 | \texttt{000B} is a list with CAR at 12 (\texttt{2004}, meaning | |
1208 | \texttt{(SYMBOL 4)}) and CDR at 11 (\texttt{000D}).\\ | |
1209 | \texttt{000D} is a list with CAR at 14 (\texttt{2003}, meaning | |
1210 | \texttt{(SYMBOL 3)}) and CDR at 13 (\texttt{0000}, meaning NIL). | |
1211 | ||
1212 | \item This binary string is sent via UART to the FPGA's READER. The | |
1213 | READER writes the string (except the first word) into memory, | |
1214 | overwriting the previous contents of the memory. It then reports to | |
1215 | the controller that it is done, and the controller advances to | |
1216 | \texttt{STATE\_RUN}. | |
1217 | ||
1218 | \item The EVAL and GC evaluate the program in memory, and then place | |
1219 | the result back into memory. The EVAL reports to the controller that | |
1220 | it is done, and the controller advances to \texttt{STATE\_WRITE}. | |
1221 | ||
1222 | \item The WRITER sends this memory dump via UART back to the REPL | |
1223 | ||
1224 | \begin{lstlisting} | |
1225 | 000F C007 2005 2000 0009 000B 2005 000D 2004 0000 2003 0000 4004 2000 0010 0000 0012 0000 0000 0009 | |
1226 | \end{lstlisting} | |
1227 | ||
1228 | Then the WRITER reports to the controller that it is done, and the | |
1229 | controller advances to \texttt{STATE\_READ}. | |
1230 | ||
1231 | Observe that this memory dump is similar to the assembler output. | |
1232 | The first two words are the 6th and 7th words in the assembler | |
1233 | output, the next word is the result of the evaluation, then the next | |
1234 | 8 words are words 9th through 16th in the assembler output. The | |
1235 | following words are intermediate results computed by the processor. | |
1236 | ||
2dbe6ed8 | 1237 | \item The REPL's pretty printer interprets this string starting with |
e7b86bf0 MG |
1238 | \texttt{2005}, which means \texttt{(SYMBOL 5)}. The pretty printer |
1239 | looks in the compiler's mapping and finds that the 5th symbol was | |
67d929ce | 1240 | 2, and so the REPL prints |
e7b86bf0 MG |
1241 | |
1242 | \begin{lstlisting} | |
67d929ce | 1243 | 2 |
e7b86bf0 MG |
1244 | * |
1245 | \end{lstlisting} | |
1246 | ||
1247 | The evaluation is complete, and a new prompt is printed so that the | |
1248 | user can input another expression. | |
1249 | \end{enumerate} | |
1250 | ||
67d929ce MG |
1251 | \begin{figure} |
1252 | \centering\includegraphics[height=7cm]{repl.png} | |
1253 | \caption{What the user types and sees in this example} | |
1254 | \end{figure} | |
1255 | ||
e7b86bf0 MG |
1256 | \chapter{Conclusions} |
1257 | A Lisp machine processor has been built using the original design and | |
1258 | techniques presented in \cite{lambda}. It has been extended with input | |
1259 | and output capabilities, and using a modern FPGA it has been | |
1260 | synthesised on a physical device slightly larger than a USB flash | |
1261 | drive. This device can be used as a standalone Lisp evaluator, or the | |
1262 | processor inside it can be included in a more complicated circuit. | |
1263 | ||
1264 | Software has been written to help interact with the processor. This | |
1265 | includes a compiler, an assembler, and a REPL that together make it | |
1266 | possible for an end-user unfamiliar with hardware design or the | |
1267 | processor's internals to submit Lisp expressions to the processor and | |
1268 | get back their values. The software also allows programmers to convert | |
1269 | Lisp source code to the binary format understood by the processor, so | |
1270 | that they may later send the binaries to the processor for evaluation. | |
1271 | ||
1272 | The processor and software have been tested to ensure they work | |
1273 | correctly. This has uncovered bugs and inefficiencies in the original | |
1274 | paper, some of which could not be gleaned from simply reading the | |
1275 | paper. | |
1276 | ||
1277 | The source files for the hardware and software, and also the automated | |
1278 | tests have been released under an open-source license to make it easy | |
1279 | for future circuit designers and programmers to use and extend this | |
1280 | project. They can be found at \url{https://git.ieval.ro/?p=yule.git}. | |
1281 | ||
1282 | The original paper \cite{lambda} is one of the influential Lambda | |
1283 | Papers that defined the Scheme programming language and is an | |
2dbe6ed8 | 1284 | important historical artifact for understanding how Lisp machine |
e7b86bf0 MG |
1285 | processors can be implemented and more generally how tagged |
1286 | architectures work. But all details of the processor cannot be | |
1287 | understood from simply reading the paper, and most readers would not | |
1288 | follow by attempting to implement what is described there. This | |
1289 | projects brings a tangible implementation into the open, which shows | |
1290 | the limitations of the design to people interested in this design and | |
1291 | makes it possible to solve this limitations and extend the processor | |
1292 | beyond its original scope. | |
1293 | ||
1294 | \section{Reflections} | |
1295 | Now that the project is finished, it is worth revisiting ideas I had | |
1296 | in the beginning that were not ultimately useful. | |
1297 | ||
1298 | The original concept had the processor run on a typical development | |
1299 | board, with the software running on a Raspberry PI. The board and the | |
1300 | Raspberry PI would be connected through jumper wires, and the UART | |
1301 | circuitry on the Raspberry PI would be used by the software to talk to | |
1302 | the processor. | |
1303 | ||
1304 | This idea would have produced a much harder to use YULE. Not only | |
1305 | would the user have to ssh into a Raspberry PI to do anything useful, | |
1306 | but also the equipment would take more space (both boards are larger | |
1307 | than the iCEstick). Once I tried the iCEstick and found that | |
1308 | everything can be done just as well with it, it replaced the two | |
1309 | boards and made the project easier to use (albeit slightly harder to | |
1310 | debug, as the iCEstick only has 5 LEDs). | |
1311 | ||
1312 | Some code had to be rewritten, sometimes multiple times. The compiler | |
1313 | was originally written in Common Lisp, as it is the best suited | |
1314 | language for manipulation of S-expressions, especially Lisp code. This | |
1315 | meant a working project was obtained more quickly, but then the | |
1316 | compiler had to be rewritten in another language; if the user has a | |
1317 | software Lisp implementation already, why would they need YULE? I | |
1318 | chose Perl as the reimplementation language, as the assembler was | |
1319 | already written in Perl and it is widely-available, most systems | |
1320 | coming with a Perl interpreter out of the box. | |
1321 | ||
1322 | Throughout development of the hardware part, timing problems were a | |
1323 | mainstay. The first ones related to the interactions between the | |
1324 | processor and the memory (the memory was clocked, but the processor | |
1325 | expected a combinational RAM). What fixed this problem was to run the | |
1326 | memory at twice the speed of the rest of the processor. At some point, | |
1327 | a decision was taken to run the GC and EVAL on different clocks (both | |
1328 | ran at one quarter the speed of the master clock, but one of the | |
1329 | clocks was delayed by half a cycle. The positive edge on one clock | |
1330 | would happen midway between a positive edge and a negative edge on the | |
1331 | other one). This proved to be unwise, as sometimes values sent by one | |
1332 | module to the other would be cleared before the destination module | |
1333 | managed to read them. | |
1334 | ||
1335 | GTKWave proved to be very useful in debugging these timing issues, as | |
1336 | the exact point where the issue began was easy to identify and I could | |
1337 | see why the issue was happening. A workaround for this was to add a | |
1338 | latch on each side of the common E bus that remembered the value sent | |
1339 | long enough for the receiving module to see it. | |
1340 | ||
1341 | All timing problems were fixed more cleanly in the end when an attempt | |
1342 | to simplify the clocks was made. First GC and EVAL were changed to use | |
1343 | the same clock, which removed the need for the latches, and then the | |
1344 | memory was changed to use the negative edge of the clock instead of | |
1345 | the positive edge, which meant no more clock division code had to be | |
1346 | used. | |
1347 | ||
1348 | \section{Future work} | |
1349 | There are several ways to extend this project. | |
1350 | ||
1351 | One idea is to fix the noted limitations: the ATOM primitive considers | |
1352 | NIL to not be an atom, and there is no EQ primitive that compares | |
1353 | symbols/other objects. The latter problem means that all symbols are | |
1354 | indistinguishable (note: NIL can still be distinguished from T and | |
1355 | other symbols, but NIL is not a symbol). To introduce equality a new | |
1356 | signal needs to be added to the EVAL that compares (for example) the | |
1357 | value on the E bus with the value of a fixed register, possibly a | |
1358 | newly added one. Then (EQ A B) would store A in the register, load B | |
1359 | on the E bus and compare them. | |
1360 | ||
1361 | Once equality is implemented, it becomes possible to implement the | |
1362 | EVAL function that evaluates an arbitrary Lisp expression. | |
1363 | ||
1364 | Two more involved instruction set improvements are adding arithmetic | |
1365 | operations, and input/output. Without arithmetic primitives one has to | |
1366 | rely on Church numerals, which encodes numbers as lambda abstractions | |
1367 | and is very inefficient. Input/output primitives would allow programs | |
1368 | to receive data at runtime over the UART and send data back before | |
1369 | evaluation is finished. This enables long-running useful programs, for | |
1370 | example with input/output and equality a REPL can be written (which | |
1371 | relies on the EVAl function mentioned above). | |
1372 | ||
1373 | On the software side, the assembler can support optimizations and then | |
1374 | the REPL can be extended with a core image. The main optimization | |
1375 | needed in the assembler is common subexpression elimination. The | |
1376 | processor never overwrites a value in memory with another value (other | |
1377 | than overwriting \texttt{mem[6]} with the result of the evaluation), | |
1378 | so it is safe to share any identical subexpressions. For example, if | |
1379 | the user inputs the program \texttt{(cons (car '(1 2 3 4)) (cdr '(0 2 | |
1380 | 3 4)))}, then when the assembler should share the tails of the two | |
1381 | lists, meaning the second list should compile to \texttt{(LIST (LIST | |
1382 | X) (SYMBOL Y))} where X is a pointer to where the second element of | |
1383 | the first list is stored. This makes long programs significantly | |
1384 | shorter. Note that no compiler changes are required to implement this, | |
1385 | as the assembler can just remember all expressions and subexpressions | |
1386 | it has assembler so far including their locations, and whenever a new | |
1387 | expression needs to be assembler it can be replaced with a pointer to | |
1388 | where the same expression has been placed before. | |
1389 | ||
1390 | Once this optimization is implemented, the REPL can gain core image | |
1391 | support. Suppose the user enters the expression \texttt{expr}. Instead | |
1392 | of sending \texttt{assemble(compile(expr))} to the processor, the REPL | |
1393 | can remember the last expression sent (or the last memory dump | |
1394 | received), then append \texttt{assemble(compile(expr))} to that | |
1395 | previous expression. This way \texttt{expr} can refer to previously | |
1396 | defined expressions, as long as the REPL or compiler has a way to name | |
1397 | previously supplied instructions, for example via a \texttt{define} | |
1398 | special form, via the standard REPL variables (\texttt{+, ++, +++}), | |
1399 | or by sequentially numbering all expressions submitted by the user. | |
1400 | ||
1401 | Returning to hardware, a potential future project is to add actual | |
1402 | hardware garbage collection to the GC component. At the moment the | |
1403 | design does no garbage collection, the processor only has 4096 words | |
1404 | of RAM available, and evaluation is quite memory-inefficient. Adding | |
1405 | garbage collection would make more complicated programs possible | |
1406 | despite the limited RAM. One way to implement this is to add a small | |
1407 | microprocessor to the design which is preprogrammed with a garbage | |
1408 | collection algorithm (simple choices are a stop-and-copy collector or | |
1409 | Cheney's algorithm). When the free space pointer gets too high, the | |
1410 | EVAL and GC are stopped, and the new microprocessor is started up so | |
1411 | it can perform garbage collection, compact the memory, and update the | |
1412 | free space pointer. The EVAL/GC registers would have to be updated as | |
1413 | well. Then the microprocessor can be stopped, EVAL and GC be started | |
1414 | up again and evaluation can continue. Note that a non-moving collector | |
1415 | is not sufficient: the GC allocates new words only at the very end of | |
1416 | the memory, so it cannot benefit from free ``gaps'' in the middle of | |
1417 | the memory. | |
1418 | ||
1419 | \begin{thebibliography}{99} | |
1420 | \bibitem{lambda} Design of LISP-Based Processors (1979), available at | |
1421 | \url{http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-514.pdf} | |
1422 | \bibitem{osdvu} \url{https://opencores.org/project/osdvu} | |
1423 | \bibitem{gtkwave} \url{http://gtkwave.sourceforge.net/} | |
1424 | \bibitem{icestorm} \url{http://www.clifford.at/icestorm/} | |
1425 | \end{thebibliography} | |
1426 | \end{document} |