| 1 | \documentclass[a4paper,12pt]{report} |
| 2 | \usepackage{color} |
| 3 | \usepackage[cm]{fullpage} |
| 4 | \usepackage{graphicx} |
| 5 | \usepackage{hyperref} |
| 6 | \usepackage{listings} |
| 7 | \usepackage{pdfpages} |
| 8 | |
| 9 | \title{YULE: Yet Unnamed Lisp Evaluator} |
| 10 | \author{Marius Gavrilescu} |
| 11 | \date{May 2018} |
| 12 | |
| 13 | \begin{document} |
| 14 | % title page does not need anchors |
| 15 | \hypersetup{pageanchor=false} |
| 16 | \maketitle |
| 17 | \hypersetup{pageanchor=true} |
| 18 | \lstset{language=Perl, frame=leftline, framexleftmargin=5pt, |
| 19 | showstringspaces=false, showspaces=false, breaklines=true, |
| 20 | postbreak=\mbox{\textcolor{red}{$\hookrightarrow$}\space}} |
| 21 | |
| 22 | \chapter*{Abstract} |
| 23 | Lisp machines are computers designed to run Lisp programs efficiently. |
| 24 | They were first developed in the 1970s, and they have special hardware |
| 25 | support for Lisp. This usually comprises a tagged architecture (each |
| 26 | pointer has a ``type'' field), garbage collection implemented in |
| 27 | hardware, and an instruction set that lends itself well to evaluating |
| 28 | Lisp expressions. |
| 29 | |
| 30 | Between 1975 and 1980, Guy L. Steele and Gerald Jay Sussman wrote a |
| 31 | series of memos on lambda calculus, the Lisp language and its |
| 32 | implementation known collectively as the ``Lambda Papers''. One of |
| 33 | these papers, ``Design of LISP-Based Processors (1979)'' describes how |
| 34 | to implement a Lisp machine processor. If this processor is connected |
| 35 | to a RAM that contains a program, it executes that program and places |
| 36 | the result back into the RAM. |
| 37 | |
| 38 | The aim of YULE is to take the design from the 1979, implement it |
| 39 | today using modern technology and techniques, and bundle it up in a |
| 40 | form that is both easy to use and extend. By going through this |
| 41 | modernisation process, future developers can easily integrate the Lisp |
| 42 | processor into their own design, modifying it as they please. |
| 43 | |
| 44 | The end result of YULE is a small device that can be plugged into a |
| 45 | USB port, and an associated computer program. Running the program |
| 46 | presents a simplified Lisp listener. The user can type a Lisp |
| 47 | expression into the listener, and the listener will respond with the |
| 48 | value of the expression. Under the hood, the listener evaluates Lisp |
| 49 | by sending the expression over USB to the small device and getting |
| 50 | back the result. |
| 51 | |
| 52 | YULE comes with the design files for the hardware, and source code for |
| 53 | the software. All of these are released under a free software license |
| 54 | and published online, which makes it very easy (from both a practical |
| 55 | and a legal standpoint) for future developers to use in their own |
| 56 | projects and modify to fit their purposes better. |
| 57 | \tableofcontents |
| 58 | |
| 59 | \chapter{Introduction} |
| 60 | \section{Motivation} |
| 61 | Lisp machine processors are quite unlike modern processors. Modern |
| 62 | instruction sets are designed for imperative programming. Each |
| 63 | instruction makes the processor execute some operation, such as |
| 64 | loading a register or storing a value in memory. In contrast, a Lisp |
| 65 | machine processor reads an expression from memory, determines its |
| 66 | type, and recursively evaluates it. This operation lends itself to |
| 67 | declarative programming, where the programmer declares an arbitrarily |
| 68 | complicated expression and asks the computer to evaluate it. |
| 69 | |
| 70 | Another common feature of Lisp machine processors is the use of tagged |
| 71 | pointers. Each memory location is a tagged union of two elements: the |
| 72 | value itself, and a tag representing the type of that data. Tagged |
| 73 | architectures are not common today, although some programs still |
| 74 | use tagged pointers by exploiting pointer alignment (if every pointer |
| 75 | is a multiple of 8, then the bottom 3 bits can be used to store a |
| 76 | tag). These tags can be used to store an object's reference count (as |
| 77 | in Objective-C on certain versions of iOS), or information about a |
| 78 | data structure (as in certain implementations of kd-trees). |
| 79 | |
| 80 | The ``Design of LISP-Based Processors (1979)'' paper by Guy L. Steele |
| 81 | and Gerald Jay Sussman \cite{lambda} presents the design of a |
| 82 | relatively simple Lisp machine processor, which is made from two |
| 83 | read-only memories, two buses, and several registers. If this |
| 84 | processor is connected to a RAM containing a Lisp expression in a |
| 85 | certain binary format, the processor will evaluate the expression and |
| 86 | place the result back in memory. |
| 87 | |
| 88 | This processor features both a tagged architecture (the top 3 bits of |
| 89 | each word in memory represent the value's type) and the declarative |
| 90 | nature mentioned above. It is therefore interesting to study and to |
| 91 | bring this processor into today's world by using technology such as |
| 92 | hardware description languages (that allow us to describe a circuit |
| 93 | formally) and FPGAs (that allow us to synthesise any circuit quickly |
| 94 | and at low cost). By building a HDL implementation of the processor |
| 95 | not only can it be used easily on its own (by synthesising it to a |
| 96 | FPGA and connecting it to RAM), but a future circuit designer can take |
| 97 | this processor and include it in their more complicated design. |
| 98 | |
| 99 | The processor presented in the original paper is a somewhat barebones |
| 100 | implementation of Lisp. Arithmetic primitives are not available, nor |
| 101 | are there any input or output routines. Due to time constraints the |
| 102 | paper's writers have not included a equality primitive. Additionally |
| 103 | the ATOM primitive has a bug that makes it consider NIL not atomic. |
| 104 | All these deficiencies can be rectified given time and effort, but it |
| 105 | is difficult for one to fix them given only a paper describing the |
| 106 | processor. Having a HDL implementation of the processor with a proper |
| 107 | test suite allows a future circuit designer to more easily make |
| 108 | changes to the processor while using the test suite to prevent |
| 109 | regressions and using a FPGA to quickly obtain a hardware version of |
| 110 | the changed processor that can be manually tested and used in any |
| 111 | desired application. |
| 112 | |
| 113 | \section{Challenges} |
| 114 | While \cite{lambda} is part of the well-known Lambda Papers series and |
| 115 | has been read by many over the years, there are no published |
| 116 | implementations of the processor described in the paper. Therefore |
| 117 | \cite{lambda} was the only source used in the building of this |
| 118 | project, and several of the limitations of the processor were not easy |
| 119 | to see before the implementation phase was well underway. Furthermore, |
| 120 | the original paper has several shortcomings. |
| 121 | |
| 122 | For one, while the microcode (the values of the two read-only |
| 123 | memories) is provided in both source and compiled form, the microcode |
| 124 | compiler is not provided. This makes it difficult to alter the |
| 125 | microcode to fix bugs or introduce new instructions. One needs to |
| 126 | either work on the already-compiled code (hard and error-prone) or |
| 127 | reimplement the microcode compiler, which takes a reasonable amount of |
| 128 | effort. |
| 129 | |
| 130 | Moreover, the paper does not concern itself at all with how the user |
| 131 | gets a program into memory or how the user gets the result back. This |
| 132 | makes the processor as-described very difficult to use. |
| 133 | |
| 134 | To implement the processor in Verilog, no source other than the paper |
| 135 | was used. There were no previous implementations of the processor that |
| 136 | could be consulted to resolve unclarities in the original paper. |
| 137 | |
| 138 | In addition, there was no software available for interacting with the |
| 139 | processor. A compiler and assembler had to be written from scratch, |
| 140 | and so the processor could not be easily debugged until after the |
| 141 | software was functional. |
| 142 | |
| 143 | \section{Contribution} |
| 144 | YULE is a modern-day implementation of a Lisp machine processor using |
| 145 | the ideas and techniques presented in \cite{lambda}. |
| 146 | |
| 147 | The main contribution of YULE is a Verilog implementation of the |
| 148 | processor which allows future circuit designers to use it in their own |
| 149 | design and modify it as they please. |
| 150 | |
| 151 | As the processor as described has no way to input a program or extract |
| 152 | a result from memory, YULE includes more circuitry around the |
| 153 | processor that lets it communicate with the external word. This |
| 154 | consists of a serial UART and two modules, a reader and a writer. When |
| 155 | the processor is idle, the reader waits for a program to be sent over |
| 156 | the UART. Once a program has been received, it is placed into memory |
| 157 | and the processor starts to evaluate it. Once evaluation is done, the |
| 158 | writer transmits the new contents of the memory over the UART. |
| 159 | |
| 160 | Besides the hardware chip, YULE also includes three pieces of |
| 161 | software: |
| 162 | |
| 163 | \begin{itemize} |
| 164 | \item a compiler, which takes Lisp source code and produces an |
| 165 | intermediate representation of the program. |
| 166 | \item an assembler, which takes a program in the intermediate |
| 167 | representation and produces the binary representation of the program |
| 168 | that the processor can understand. |
| 169 | \item a REPL, in which users can type expressions. When an expression |
| 170 | is submitted, the REPL compiles it, assembles it, and sends it to |
| 171 | the processor. The processor evaluates the expression and sends back |
| 172 | a memory dump. The REPL then reads the memory dump, interprets it, |
| 173 | and prints the value of the expression in a human-readable format. |
| 174 | \end{itemize} |
| 175 | |
| 176 | The REPL is therefore the entry point of YULE as a whole. An end-user |
| 177 | who does not intend to extend YULE can plug in a USB device containing |
| 178 | the YULE processor, and run the REPL. Then the user can write |
| 179 | expressions in the REPL and get their values back. Besides being the |
| 180 | entry point, the REPL serves as a convenient way to test that the |
| 181 | entire system is working. An expression typed in the REPL goes through |
| 182 | the compiler and assembler, and then is sent to the processor. It goes |
| 183 | through the three stages of the processor (reading the expression into |
| 184 | memory, evaluating the contents of the memory, writing a memory dump). |
| 185 | |
| 186 | The hardware and software components of YULE have been tested to |
| 187 | ensure they function correctly. There is an automated test suite |
| 188 | containing 48 unit tests for the compiler and assembler, and 21 |
| 189 | functional tests that engage all parts of the system (both the |
| 190 | software and the hardware). The unit tests ensure that the various |
| 191 | small parts of the compiler and assembler work properly, while the |
| 192 | functional tests consist of pairs of short Lisp expressions and their |
| 193 | correct values. The expressions are written to the REPL, and the |
| 194 | REPL's output is compared to the correct values. |
| 195 | |
| 196 | To make it easier for future circuit designers and programmers to use |
| 197 | this project, YULE has been released under an open source license. |
| 198 | Prospective users can find all the source files (for both hardware and |
| 199 | software) online, and are free to use, modify, and redistribute them |
| 200 | under the same terms as Perl itself. The files can be accessed at this |
| 201 | URL: \url{https://git.ieval.ro/?p=yule.git}. |
| 202 | |
| 203 | \section{Layout of this report} |
| 204 | This report begins by describing the background of YULE, the |
| 205 | technologies that made this project possible and a more in-depth |
| 206 | description of the 1979 paper. Then the components that YULE comprises |
| 207 | are examined in turn, followed by a chapter on how everything was |
| 208 | tested, which includes a worked example of the system in action. The |
| 209 | report ends with a list of ideas for future work based on YULE, and a |
| 210 | section on reflections. |
| 211 | |
| 212 | \chapter{Background} |
| 213 | |
| 214 | \section{Field-programmable gate array} |
| 215 | An FPGA is a configurable integrated circuit. A designer can define a |
| 216 | circuit using a hardware description language (explained in the next |
| 217 | section) and compile it to some binary format called a configuration. |
| 218 | This configuration describes how the different components of the FPGA |
| 219 | should be connected to each other. The configuration can then be |
| 220 | uploaded to the FPGA, and the FPGA will connect its internal |
| 221 | components as requested. |
| 222 | |
| 223 | An FPGA can therefore replace any digital custom circuit (ASIC). While |
| 224 | an FPGA consumes more energy, uses more space, and runs at slower |
| 225 | clock rates than an ASIC, the FPGA permits rapid prototyping: the time |
| 226 | between finishing to write a design and having a working part is mere |
| 227 | seconds (compilation time + uploading the configuration to the FPGA), |
| 228 | and if a bug is discovered the buggy part need not be thrown away. The |
| 229 | design can be modified and the fixed configuration can be uploaded to |
| 230 | the FPGA. |
| 231 | |
| 232 | The hardware aspect of YULE is designed to run on the Lattice |
| 233 | iCE40HX-1k FPGA, but with small modifications it can run on other |
| 234 | FPGAs as well. |
| 235 | |
| 236 | \subsection*{iCEstick} |
| 237 | \begin{figure} |
| 238 | \centering |
| 239 | \includegraphics[height=4cm]{icestick.png} |
| 240 | \caption{Lattice iCEstick Evaluation Kit} |
| 241 | \end{figure} |
| 242 | |
| 243 | The Lattice iCEstick Evaluation Kit is a development board for the |
| 244 | iCE40HX-1k FPGA. It contains the FPGA, a USB connector, a FTDI chip |
| 245 | and 5 LEDs. The form factor is very convenient. It is similar to a USB |
| 246 | memory stick (albeit bigger) that can be plugged into a computer, |
| 247 | avoiding the need for jumper wires and GPIO that traditional |
| 248 | development boards require. |
| 249 | |
| 250 | The FTDI chip implements the USB protocol, allowing the FPGA to be |
| 251 | programmed by the computer and letting the FPGA talk to the computer |
| 252 | using a very simple serial UART interface on the FPGA side, and a |
| 253 | standard USB serial port on the computer side. This makes |
| 254 | communication very easy. The FPGA can use any off-the-shelf UART |
| 255 | implementation (this project uses \texttt{osdvu} from |
| 256 | Opencores \cite{osdvu}), and the computer can use standard serial |
| 257 | console tools such as \texttt{minicom}. |
| 258 | |
| 259 | \section{Verilog} |
| 260 | Verilog is a hardware description language. It is used to model |
| 261 | digital circuits at register-transfer level of abstraction. In Verilog |
| 262 | one defines several modules, with one of them (the ``toplevel'' |
| 263 | module) representing the entire design. Each module has several inputs |
| 264 | and outputs, can contain internal wires, can contain instances of |
| 265 | other modules, and can contain combinational and sequential logic |
| 266 | blocks that drive internal wires and outputs. |
| 267 | |
| 268 | Verilog has many more complicated features that are not used in YULE. |
| 269 | These include tristate buffers, delays, functions, tasks, loops, and |
| 270 | strength declarations. |
| 271 | |
| 272 | A subset of Verilog can be \textit{synthesised}; that is converted to |
| 273 | a netlist (a list of transistors or FPGA components used and |
| 274 | connections between them) that can then be used to build an ASIC |
| 275 | (custom integrated circuit), or can be further converted to the |
| 276 | internal representation of the FPGA and uploaded to the FPGA. The FPGA |
| 277 | workflow and the programs that are part of it are described in the |
| 278 | next section. |
| 279 | |
| 280 | A verilog project can also be run through a simulator, which simulates |
| 281 | the design clock edge by clock edge and records the values on each |
| 282 | wire in the design. This record can later be read in and visualised |
| 283 | with a wave viewer, which allows a circuit designer to better |
| 284 | understand why the circuit is not working correctly. One such program, |
| 285 | GTKWave \cite{gtkwave}, was used to debug YULE. |
| 286 | |
| 287 | \section{icestorm} |
| 288 | Project IceStorm \cite{icestorm} is a set of open source reverse |
| 289 | engineered tools for programming Lattice iCE40 FPGAs. There exists a |
| 290 | fully open source Verilog-to-Bitstream flow which comprises: |
| 291 | |
| 292 | \begin{enumerate} |
| 293 | \item Yosys, a Verilog synthetizer. |
| 294 | \item Arachne-pnr, a tool that implements placing and routing for |
| 295 | iCE40 FPGAs. |
| 296 | \item icepack, which converts Arachne-pnr's output to iCE40 bitstream. |
| 297 | \item iceprog, which programs the iCEstick with the output of icepack. |
| 298 | \end{enumerate} |
| 299 | |
| 300 | \section{Design of LISP-Based Processors (1979)} |
| 301 | Design of LISP-Based Processors (1979) \cite{lambda} is one of the |
| 302 | ``Lambda papers''; a series of memos by Sussman and Steele about the |
| 303 | Lisp language and its implementation. The paper describes SIMPLE, a |
| 304 | microprocessor designed to run Lisp. |
| 305 | |
| 306 | This processor is made of two read-only memories called EVAL and GC. |
| 307 | Each read-only memory is paired with state registers to form a state |
| 308 | machine. The contents of the state registers are fed into the |
| 309 | read-only memory, and the output of the read-only memory determines |
| 310 | the next state and other signals. |
| 311 | |
| 312 | Each state machine also has an internal bus (called the E and G buses |
| 313 | respectively) and several registers connected to that bus. Each of |
| 314 | these registers stores a tagged pointer, which is an 11-bit value (the |
| 315 | top 3 bits represent the type, and the bottom 8 bytes represent the |
| 316 | value). |
| 317 | |
| 318 | The signals coming out of each state machine can cause a register to |
| 319 | be read onto the machine's internal bus, a register to be loaded from |
| 320 | the internal bus, or write a constant value onto the internal bus. |
| 321 | |
| 322 | The GC state machine represents the processor's memory system. It is |
| 323 | connected to an external RAM, which it controls through several |
| 324 | signals. It knows how to execute certain ``GC operations'', which |
| 325 | represent common Lisp memory management actions. Examples of these are |
| 326 | \texttt{CAR} and \texttt{CDR} (``given a pointer to a cons cell, |
| 327 | return either the CAR or the CDR of that cons cell'') or \texttt{CONS} |
| 328 | (``given two pointers, allocate a new cons cell, copy them into it, |
| 329 | and return a pointer to the new cons cell''). The existence of the GC |
| 330 | isolates the rest of the processor from the particularities of the |
| 331 | memory used. |
| 332 | |
| 333 | The EVAL state machine evaluates Lisp expressions. Signals coming out |
| 334 | of it control the GC by selecting what operation the GC should perform |
| 335 | and connecting the E and G buses to supply arguments to the GC and |
| 336 | read results back. The EVAL reads expressions from memory via the GC |
| 337 | and evaluates them, putting results back into memory. |
| 338 | |
| 339 | The overall operation of the processor is to read an expression from |
| 340 | the magic location ``5'' in memory, evaluate it, and place the result |
| 341 | into the magic location ``6'' in memory. Other memory locations are |
| 342 | used to store subexpressions of the input expression, and to store any |
| 343 | intermediate results. Despite its name, GC does not perform any |
| 344 | garbage collection, so at the end of the processor's operation the |
| 345 | memory will still contain all intermediate results produced during |
| 346 | evaluation. |
| 347 | |
| 348 | The processor is intended as a proof of concept and not a |
| 349 | production-ready hardware implementation of Lisp. Therefore it has |
| 350 | several important limitations, for example there is no primitive |
| 351 | operation that implements equality of objects (making it impossible to |
| 352 | distinguish between symbols), there are no primitive arithmetic |
| 353 | operations (so one needs to use Church numerals if numbers are |
| 354 | needed), and the processor includes a known bug (``the ATOM bug''). |
| 355 | The bug is that the ATOM primitive considers NIL to be non-atomic, |
| 356 | which is incorrect. |
| 357 | |
| 358 | The Lisp expressions mentioned here are in textual form (Lisp source |
| 359 | code), but instead in a special tagged pointer format. Each memory |
| 360 | value is a union of two values: a 8-bit pointer and a 3-bit type. We |
| 361 | can use pairs of consecutive memory locations to represent cons cells |
| 362 | and build linked lists out of them. If a list is at location X, that |
| 363 | means that (X, X+1) is the first cons cell of the list, with X+1 being |
| 364 | the CAR (the value of the first element) and X being the CDR (a |
| 365 | pointer to the next cons cell). A discussion of the different ways to |
| 366 | represent Lisp expressions (source code, this tagged pointer format, |
| 367 | and an intermediate representation) can be found in \ref{sec:formats}. |
| 368 | |
| 369 | The next page shows a diagram of how the processor in the paper is |
| 370 | structured. The two read-only memories can be seen, with their |
| 371 | registers, state registers, and the interconnection between the E and |
| 372 | G buses. |
| 373 | |
| 374 | \begin{figure} |
| 375 | \centering\includegraphics[width=18.5cm]{AIM-514-design.pdf} |
| 376 | \caption{Design diagram of processor from \cite{lambda}} |
| 377 | \end{figure} |
| 378 | |
| 379 | \chapter{A Lisp Machine processor on a USB drive} |
| 380 | The project comprises a hardware part written in Verilog and a |
| 381 | software part written in Perl. |
| 382 | |
| 383 | The hardware part is a Lisp machine processor with associated |
| 384 | circuitry that can read a program from the iCEstick UART and that can |
| 385 | write the result of running a program to the UART. The normal |
| 386 | operation of this part is to: |
| 387 | |
| 388 | \begin{enumerate} |
| 389 | \item Wait for a program to be supplied via UART |
| 390 | \item Run the program on the processor |
| 391 | \item Write a memory dump to the UART |
| 392 | \item Go back to step 1 |
| 393 | \end{enumerate} |
| 394 | |
| 395 | The software part includes: |
| 396 | |
| 397 | \begin{itemize} |
| 398 | \item A compiler, which takes Lisp source code and puts it in the |
| 399 | linked list format understood by the SIMPLE processor |
| 400 | \item An assembler, which takes the result of the compiler and |
| 401 | converts it to the SIMPLE in-memory binary format. |
| 402 | \item A REPL, which repeatedly reads a program from the user, compiles |
| 403 | it, assembles it, sends it to YULE, reads the memory dump from YULE, |
| 404 | and prints the result for the user to see. |
| 405 | \end{itemize} |
| 406 | |
| 407 | \section{Hardware} |
| 408 | The hardware part is built from 8 modules: |
| 409 | |
| 410 | \begin{enumerate} |
| 411 | \item The \textbf{READER}, which implements step 1 of the normal operation described above. |
| 412 | \item The \textbf{WRITER}, which implements step 3 of the normal operation described above. |
| 413 | \item The \textbf{GCRAM}, a single-port RAM. |
| 414 | \item The \textbf{GC}, corresponding to the GC PLA in the original implementation. |
| 415 | \item The \textbf{EVAL}, corresponding to the EVAL PLA in the original implementation. |
| 416 | \item The \textbf{CTRL}, which advances the operation from one step to the next one. |
| 417 | \item The \textbf{UART}, which implements transmission and reception of bytes via UART. |
| 418 | \item The \textbf{CPU}, which instantiates every module above and wires them together. |
| 419 | \end{enumerate} |
| 420 | |
| 421 | \subsection{Clocks} |
| 422 | To avoid timing problems, a single 48 MHz clock is used. The iCEstick |
| 423 | has a 12 MHz oscillator, and the phase-locked loop circuitry on the |
| 424 | FPGA multiplies this signal by 4. |
| 425 | |
| 426 | Every flip-flop updates on the positive edge of the clock, except for |
| 427 | the RAM which updates on the negative edge. This way the rest of the |
| 428 | circuit perceives the RAM to be combinational (it ``updates |
| 429 | immediately''): if on a positive edge the RAM's address input changes, |
| 430 | the RAM will update its output on the following negative edge and the |
| 431 | changed output will be visible on the next positive edge. If the RAM |
| 432 | updated on a positive edge, then if the input changed on positive edge |
| 433 | $i$ the changed output will only be visible on positive edge $i+2$. |
| 434 | |
| 435 | The design is intended to be run on a FPGA, which has a fixed clock |
| 436 | tree. This typically consists of a small number of special nets called |
| 437 | global buffers that can distribute signals with minimal skew to every |
| 438 | logic cell. The FPGA on the iCEstick has 8 such global buffers. |
| 439 | |
| 440 | |
| 441 | There are 4 components in the design (GC, EVAL, READER, WRITER) that |
| 442 | need to be enabled/disabled at different times. One way to achieve |
| 443 | this is to AND the master 48 MHz clock with several signals, thus |
| 444 | creating a separate clock for each component. This is known as |
| 445 | \textit{clock gating}. However, on a FPGA clock gating involves |
| 446 | routing the clock from a global buffer to a logic cell and then back |
| 447 | to another global buffer. This can introduce glitches in the clock and |
| 448 | would delay each of the clocks by some amount (dependent on the |
| 449 | signals it is ANDed with). It is therefore possible that the 4 output |
| 450 | clocks would not be aligned, causing problems when data is |
| 451 | communicated by one component to another. As such, clock gating is not |
| 452 | recommended on FPGAs. |
| 453 | |
| 454 | \begin{figure} |
| 455 | \centering\includegraphics[height=3cm]{gating.eps} |
| 456 | \caption{Gated clock on the left, clock enable on the right} |
| 457 | \end{figure} |
| 458 | |
| 459 | To avoid this problem, the design used here does not use clock gating |
| 460 | and instead uses 4 clock enable signals supplied by the controller to |
| 461 | each of the GC, EVAL, READER and WRITER. At any point the clock is |
| 462 | enabled for exactly one of the READER, GC, and WRITER. The EVAL clock |
| 463 | is enabled if and only if the GC clock is enabled and the GC's |
| 464 | \texttt{step\_eval} output is high. |
| 465 | |
| 466 | \subsection{GC} |
| 467 | \begin{figure} |
| 468 | \centering\includegraphics[height=4.88cm]{gc.eps} |
| 469 | \caption{GC module} |
| 470 | \end{figure} |
| 471 | |
| 472 | The GC module is connected to the GCRAM (via the \texttt{ram\_do, |
| 473 | ram\_we, ram\_addr, ram\_di} signals), to the EVAL (via the |
| 474 | \texttt{Ein, Eout, gcop, step\_eval, conn\_et, conn\_ea} signals) and |
| 475 | to the controller. It also provides the value of its P register (the |
| 476 | free storage pointer) to the WRITER, so the writer can know when to |
| 477 | stop dumping the memory. |
| 478 | |
| 479 | It is implemented as a combinational ROM, some combinational logic to |
| 480 | drive the internal G bus, sequential logic to advance to the next |
| 481 | state respecting dispatch rules, and sequential logic to update |
| 482 | registers. The clock enable signal acts as an active low synchronous |
| 483 | reset: when the clock is disabled, the state register is set to the |
| 484 | initial state. |
| 485 | |
| 486 | This module represents one half of the original design presented in |
| 487 | \cite{lambda}, and the ROM's contents are those from the paper with a |
| 488 | bug (the original design sets the ADR line low in two states when it |
| 489 | should be high). |
| 490 | |
| 491 | \subsection{EVAL} |
| 492 | \begin{figure} |
| 493 | \centering\includegraphics[height=3.29cm]{eval.eps} |
| 494 | \caption{EVAL module} |
| 495 | \end{figure} |
| 496 | |
| 497 | The EVAL module is connected to the GC (via the \texttt{conn\_et, |
| 498 | conn\_ea, Ein, Eout, gcop} signals) and to the controller. |
| 499 | |
| 500 | The implementation of the EVAL is very similar to that of the GC. |
| 501 | There is a combinational ROM, combinational logic to drive the E bus, |
| 502 | sequential logic to advance to the next state respecting dispatch |
| 503 | rules, and sequential logic to update registers. A synchronous reset |
| 504 | is available. When the reset line is held high for one clock cycle, |
| 505 | the state register is set to the initial state. It is necessary to |
| 506 | separate the reset from the clock enable because the GC often pauses |
| 507 | the EVAL to do work and the EVAL should not be reset every time this |
| 508 | happens. |
| 509 | |
| 510 | This module represents the other half of the original design presented |
| 511 | in \cite{lambda}, and the ROM's contents are the same as those in the |
| 512 | paper except that the ROM is widened by one bit to indicate whether |
| 513 | the EVAL is finished evaluating the expression. |
| 514 | |
| 515 | \subsection{Writer} |
| 516 | \begin{figure} |
| 517 | \centering\includegraphics[height=2.26cm]{writer.eps} |
| 518 | \caption{WRITER module} |
| 519 | \end{figure} |
| 520 | |
| 521 | The WRITER module is connected to the UART (via the \texttt{tx\_busy, |
| 522 | tx\_signal, tx\_byte} signals), the GC (via the \texttt{P} signal), |
| 523 | the RAM (via the \texttt{ram\_addr, ram\_do} signals), and to the |
| 524 | controller. |
| 525 | |
| 526 | Its purpose is to dump the memory to the UART starting with position 4 |
| 527 | (since positions 0, 1, 2, 3 are constant) and finishing with position |
| 528 | $P - 1$, the last used cell in memory. It is implemented as a state |
| 529 | machine with 7 states and an extra register (\texttt{current\_index}) |
| 530 | that stores the current index in the memory. |
| 531 | |
| 532 | The memory is 16-bit wide, but the UART sends and receives bytes. The |
| 533 | writer dumps memory word in network byte order (big-endian), meaning |
| 534 | the significant byte is sent before the least significant byte. |
| 535 | |
| 536 | The states are, in order: |
| 537 | \begin{description} |
| 538 | \item[START] which initializes \texttt{current\_index} to 4 |
| 539 | \item[WRITE1\_WAIT] which waits for \texttt{tx\_busy} to become low |
| 540 | \item[WRITE1] in which transmission of the upper byte starts |
| 541 | \item[WRITE2\_WAIT] which waits for \texttt{tx\_busy} to become low |
| 542 | \item[WRITE2] in which transmission of the lower byte starts |
| 543 | \item[INCREMENT] in which \texttt{current\_index} is incremented, |
| 544 | jumping back to WRITE1\_WAIT if it is still lower than |
| 545 | \texttt{freeptr}. |
| 546 | \item[FINISHED] in which \texttt{finished} is driven high |
| 547 | \end{description} |
| 548 | |
| 549 | This module is not part of the original design in \cite{lambda}, as |
| 550 | that paper did not concern itself with input and output. |
| 551 | |
| 552 | \subsection{Reader} |
| 553 | \begin{figure} |
| 554 | \centering\includegraphics[height=2.22cm]{reader.eps} |
| 555 | \caption{READER module} |
| 556 | \end{figure} |
| 557 | |
| 558 | The READER module is connected to the UART (via the |
| 559 | \texttt{rx\_signal, rx\_byte} signals), the RAM (via the |
| 560 | \texttt{ram\_we, ram\_addr, ram\_di} signals), and to the controller. |
| 561 | |
| 562 | Its purpose is to read a program from the UART and overwrite the GCRAM |
| 563 | with this program. It is implemented as a state machine with two extra |
| 564 | registers (\texttt{total\_words} and \texttt{current\_index}) to store |
| 565 | the size of the incoming program and the current position within it. |
| 566 | |
| 567 | The reader expects to receive a 13-bit \texttt{length} value, followed |
| 568 | by \texttt{length} 16-bit values that represent the desired memory |
| 569 | contents. Each of these values is sent in network byte order |
| 570 | (big-endian), meaning the most significant byte is sent before the |
| 571 | least significant byte. |
| 572 | |
| 573 | The states are, in order: |
| 574 | \begin{description} |
| 575 | \item[IDLE] which waits for \texttt{rx\_signal}, puts the read byte at the top of \texttt{total\_words}. |
| 576 | \item[LENGTH] which waits for \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{total\_words} |
| 577 | \item[READ1] which waits for the \texttt{rx\_signal}, puts the read byte at the top of \texttt{ram\_di} and increments \texttt{current\_index} |
| 578 | \item[READ2] which waits for the \texttt{rx\_signal}, puts the read byte at the bottom of \texttt{ram\_di} |
| 579 | \item[WRITE] in which \texttt{ram\_we} is high, jumping back to READ1 if \texttt{current\_index + 1 < total\_words} |
| 580 | \item[FINISHED] in which \texttt{finished} is high |
| 581 | \end{description} |
| 582 | |
| 583 | This module is not part of the original design in \cite{lambda}, as |
| 584 | that paper did not concern itself with input and output. |
| 585 | |
| 586 | \subsection{Putting the hardware together} |
| 587 | The components described above are managed by the controller, which |
| 588 | provides them with clock enable signals. The controller is a state |
| 589 | machine with three states: |
| 590 | \begin{description} |
| 591 | \item[STATE\_READ] in which only the READER clock is enabled |
| 592 | \item[STATE\_RUN] in which the GC and EVAL clocks are enabled |
| 593 | \item[STATE\_WRITE] in which only the WRITER clock is enabled |
| 594 | \end{description} |
| 595 | |
| 596 | In STATE\_READ and STATE\_WRITE, the EVAL's reset line is high. |
| 597 | |
| 598 | Whenever the component corresponding to the current state indicates it |
| 599 | is finished, the controller advances to the next state. |
| 600 | |
| 601 | The controller also includes three multiplexers that control which |
| 602 | lines are connected to the GCRAM. If the READER is active, all three |
| 603 | GCRAM lines are connected to the READER. If the WRITER is active, the |
| 604 | address line is connected to the writer (and the other lines are |
| 605 | connected to the GC). If the GC is active, all lines are connected to |
| 606 | the GC. |
| 607 | |
| 608 | The controller also includes logic to drive the 5 LEDs on the |
| 609 | iCEstick. Three LEDs indicate the current state, while the other two |
| 610 | indicate if the UART is currently transmitting or receiving. |
| 611 | |
| 612 | \section{Representing a Lisp program} |
| 613 | \label{sec:formats} |
| 614 | The processor described above evaluates a Lisp program received via |
| 615 | UART. This section talks about three representations of such a |
| 616 | program: the Lisp source code (a S-expression), the intermediate form |
| 617 | (also a S-expression), and the in-memory form (a binary string). |
| 618 | |
| 619 | The first representation is obtained by parsing the user input, the |
| 620 | compiler converts this into the second, and the assembler converts the |
| 621 | second into the third. |
| 622 | |
| 623 | \subsection{Lisp source code} |
| 624 | Recall that a S-expression is either an atom or a cons cell. |
| 625 | |
| 626 | An atom is a sequence of letters, digits, and certain symbols. There |
| 627 | are two special atoms named \texttt{T} and \texttt{NIL} representing |
| 628 | the boolean true and false respectively. |
| 629 | |
| 630 | A cons cell is a pair of two S-expressions called the ``car'' and the |
| 631 | ``cdr''. We write them as \texttt{(car . cdr)}, for example \texttt{(5 |
| 632 | . 7)} is a cons cell with car \texttt{5} and cdr \texttt{7}. |
| 633 | |
| 634 | We can use cons cells to represent linked lists. A linked list is a |
| 635 | sequence of cons cells where the cdr of one cell is the next cell, and |
| 636 | the cdr of the last cell is \texttt{NIL}. For example the list |
| 637 | \texttt{1,2,3} can be written as \texttt{(1 . (2 . (3 . NIL)))}. |
| 638 | We can also write this list more concisely as \texttt{(1 2 3)}, and we |
| 639 | write \texttt{()} to mean the atom \texttt{NIL}. |
| 640 | |
| 641 | Using these syntax elements we can encode several types of logical |
| 642 | expressions: |
| 643 | |
| 644 | \begin{itemize} |
| 645 | \item Numbers are encoded as numeric atoms, e.g. \texttt{5} |
| 646 | \item Variables are encoded as non-numeric atoms, e.g. \texttt{my-var} |
| 647 | \item Conditional statements are encoded as 4-element lists, e.g. |
| 648 | \texttt{(if is-odd 5 6)} which has value 5 if the variable |
| 649 | \texttt{is-odd} is not \texttt{NIL} and value 6 otherwise |
| 650 | \item Symbols are encoded as non-numeric atoms preceded by a single |
| 651 | quote, e.g. \texttt{'my-symbol}. This can also be written |
| 652 | \texttt{(quote my-symbol)} |
| 653 | \item Constant lists are encoded as lists preceded by a single quote, |
| 654 | e.g. \texttt{'(1 2 (3 4 5))}. This can also be written |
| 655 | \texttt{(quote (1 2 (3 4 5)))} |
| 656 | \item Functions (lambda abstractions) are encoded as 4-element lists, |
| 657 | e.g. \texttt{(lambda first (a b) a)} in which the second element is |
| 658 | the function name, the third element is a list of function |
| 659 | arguments, and the fourth element is the function body. |
| 660 | \item Function calls are encoded as lists, with the function as the |
| 661 | first element and the arguments as the other elements, e.g. |
| 662 | \texttt{(atom 5)} calls the function \texttt{atom} with argument |
| 663 | \texttt{5}. |
| 664 | \end{itemize} |
| 665 | |
| 666 | \subsection{In-memory representation} |
| 667 | The memory is an array of 16-bit integer values. We interpret these as |
| 668 | tagged pointers: the top 3 bits indicate the ``type'' of this pointer, |
| 669 | while the other 13 bits represent another memory location. A pointer |
| 670 | therefore has a type and a value. |
| 671 | |
| 672 | We represent each Lisp expression as a tagged pointer. To evaluate it, |
| 673 | the processor looks at the type of the pointer. Here is what a pointer |
| 674 | of value $v$ means based on its type. We use the syntax $mem[a]$ to |
| 675 | mean ``the contents of the memory location $a$'' (that is, a tagged |
| 676 | pointer) and $memval[a]$ to mean ``the value at the memory location |
| 677 | $a$'' (that is, the value of the tagged pointer $mem[a]$). |
| 678 | |
| 679 | \begin{enumerate} |
| 680 | \setcounter{enumi}{-1} |
| 681 | \item is a constant list, whose cdr is $mem[v]$ and car is $mem[v+1]$. |
| 682 | \item is a constant symbol. |
| 683 | \item is a variable reference, where $v$ indicates the position in the |
| 684 | environment of the referenced variable. |
| 685 | \item is a constant closure, that is the result of evaluating a lambda |
| 686 | abstraction. Values of this type do not normally appear in the input |
| 687 | code, but instead are created during evaluation of other forms. |
| 688 | \item is a procedure (lambda abstraction), whose body is $mem[v]$. |
| 689 | \item is a conditional, where $mem[v+1]$ is the condition, |
| 690 | $mem[memval[v]+1]$ is the ``then'' branch, and $mem[memval[v]]$ is the |
| 691 | ``else'' branch. |
| 692 | \item is a procedure call, which points to a special kind of linked |
| 693 | list. The CDR of each cell (other than the last one) points to the |
| 694 | next cell and has type 0, while the CDR of the last cell has value 0 |
| 695 | and its type indicates the primitive to be called. The CARs of these |
| 696 | cells indicate the arguments for the procedure, in reverse order. |
| 697 | |
| 698 | To call a user-defined procedure, we use the primitive |
| 699 | \texttt{funcall} and put the user-defined procedure as the CAR of |
| 700 | the last cell (in other words, the user-defined procedure is the |
| 701 | first argument to \texttt{funcall}). |
| 702 | \item is a quoted constant, representing the object at $mem[v]$. |
| 703 | \end{enumerate} |
| 704 | |
| 705 | The booleans \texttt{NIL} and \texttt{T} are compiled to a list |
| 706 | pointing to 0, and a symbol with value 2 respectively. |
| 707 | |
| 708 | The memory has a peculiar fixed layout. Locations 0 and 1 are the CDR |
| 709 | and CAR of NIL, locations 2 and 3 are the CDR and CAR of T, location 4 |
| 710 | holds the first unused location in memory (this tells the processor |
| 711 | where it can start to place intermediate expressions), location 5 |
| 712 | holds the expression to be evaluated, and location 6 is where the |
| 713 | processor will place the value of the expression. |
| 714 | |
| 715 | \subsection{The intermediate form} |
| 716 | To facilitate conversion from the former format to the latter, an |
| 717 | intermediate form is used. Here we represent tagged pointers as lists |
| 718 | of one of three types: |
| 719 | |
| 720 | \begin{itemize} |
| 721 | \item \texttt{(type value)}, where ``type'' is the pointer's type and |
| 722 | ``value'' is the pointer's (numeric) value. |
| 723 | \item \texttt{(type ptr)}, where the pointer's type is ``type'' and |
| 724 | the pointer's value is $x$, where $x$ is the memory location where |
| 725 | ``ptr'' (another tagged pointer) lies. |
| 726 | \item \texttt{(type ptr1 ptr2)}, where the pointer's type is ``type'' |
| 727 | and the pointer's value is $x$, where $x$ is the memory location |
| 728 | where ``ptr1'' lies and $x+1$ is the memory location where ``ptr2'' |
| 729 | lies. |
| 730 | \end{itemize} |
| 731 | |
| 732 | These three representations map nicely to the tagged pointers |
| 733 | described in the previous section. The first representation is used |
| 734 | for constant symbols and variable references, the second for |
| 735 | procedures, and the third for constant lists, conditionals, and |
| 736 | procedure calls. |
| 737 | |
| 738 | A visual representation of the intermediate representation (from the |
| 739 | AIM-514 paper) can be seen on the next page. The expression in |
| 740 | question is |
| 741 | |
| 742 | \begin{lstlisting} |
| 743 | ((lambda append (x y) (if (atom x) y (cons (car x) (append (cdr x) y)))) '(a b c) '(d e f)) |
| 744 | \end{lstlisting} |
| 745 | |
| 746 | which we would represent as this sexp: |
| 747 | |
| 748 | \begin{lstlisting} |
| 749 | (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3))) (VAR -2)) (CALL (ATOM 0) (VAR -3))))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) |
| 750 | \end{lstlisting} |
| 751 | |
| 752 | As mentioned in the Background chapter, the processor has a bug (the |
| 753 | ATOM bug) that causes \texttt{(ATOM NIL)} to be true. As a result, |
| 754 | \texttt{(atom x)} in this expression is always true, even when |
| 755 | \texttt{x} is the empty list, and the expression loops forever (or at |
| 756 | least until memory is exhausted). We can fix this by changing the |
| 757 | condition from \texttt{(atom x)} to \texttt{x}, which is false when |
| 758 | \texttt{x} is NIL and true otherwise, and swapping the two branches of |
| 759 | the conditional. The fixed program is: |
| 760 | |
| 761 | \begin{lstlisting} |
| 762 | ((lambda append (x y) (if x (cons (car x) (append (cdr x) y)) y)) '(a b c) '(d e f)) |
| 763 | \end{lstlisting} |
| 764 | |
| 765 | which is compiled to the slightly shorter sexp: |
| 766 | |
| 767 | \begin{lstlisting} |
| 768 | (CALL (MORE (MORE (FUNCALL 0) (PROC (IF (LIST (VAR -2) (CALL (MORE (CONS 0) (CALL (MORE (MORE (FUNCALL 0) (VAR -1)) (VAR -2)) (CALL (CDR 0) (VAR -3)))) (CALL (CAR 0) (VAR -3)))) (VAR -3)))) (LIST (LIST (LIST (LIST 0) (SYMBOL 6)) (SYMBOL 7)) (SYMBOL 8))) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) |
| 769 | \end{lstlisting} |
| 770 | |
| 771 | |
| 772 | \begin{figure} |
| 773 | \centering\includegraphics[width=18.5cm]{AIM-514-ir.pdf} |
| 774 | \caption{Example of intermediate representation} |
| 775 | \end{figure} |
| 776 | |
| 777 | \section{Software} |
| 778 | \subsection{Compiler} |
| 779 | The compiler takes in Lisp code and translates it to a sexp-based |
| 780 | format understood by the assembler. |
| 781 | |
| 782 | The compiler has several functions for compiling certain kinds of |
| 783 | expressions, and a function (\texttt{process\_toplevel}) that uses the |
| 784 | former to compile arbitrary expressions. |
| 785 | |
| 786 | The output from any of this functions is a sexp in the intermediate |
| 787 | form described in the previous section. |
| 788 | |
| 789 | Symbols are identified by sequences of letters on the source code side |
| 790 | and by numbers on the processor side. The compiler therefore keeps a |
| 791 | mapping between symbol names and their numbers. Whenever a new symbol |
| 792 | is encountered, a new entry is added to this map. |
| 793 | |
| 794 | The \texttt{process\_quoted} function handles constants. For example |
| 795 | if the source code contains \texttt{'(1 2 3)} then |
| 796 | \texttt{process\_quoted} is called with \texttt{(1 2 3)}. This |
| 797 | function handles the symbol interning described in the previous |
| 798 | paragraph, and converts its input to intermediate form. |
| 799 | |
| 800 | The \texttt{process\_proc} function handles lambda abstractions. For |
| 801 | example if the source code contains \texttt{(lambda first (a b) a)} |
| 802 | then \texttt{process\_proc} is called with \texttt{first}, |
| 803 | \texttt{(a b)}, and \texttt{a}. This function appends the function |
| 804 | name and arguments to the environment, then processes the body (using |
| 805 | \texttt{process\_toplevel}) with the new environment. |
| 806 | |
| 807 | The \texttt{process\_funcall} function handles function calls. For |
| 808 | example if the source code contains \texttt{(reverse-list 2 1)} then |
| 809 | \texttt{process\_funcall} is called with \texttt{reverse-list}, and |
| 810 | \texttt{(2 1)}. The arguments are first processed (using |
| 811 | \texttt{process\_toplevel}), then (if the function is not a primitive) |
| 812 | the function is processed as well. |
| 813 | |
| 814 | Finally the \texttt{process\_toplevel} handles any expression by |
| 815 | figuring out what sort of expression it is and then processing it |
| 816 | appropriately. Empty lists and \texttt{NIL} become \texttt{(list 0)}, |
| 817 | numbers and \texttt{T} are processed with \texttt{process\_quoted} |
| 818 | |
| 819 | Examples of compiler operation can be found in the compiler tests. |
| 820 | Here are a few of them: |
| 821 | |
| 822 | \begin{itemize} |
| 823 | \item The Lisp expression \texttt{'()} is compiled to \texttt{(LIST 0)} |
| 824 | \item The Lisp expression \texttt{'(5 foo)} is compiled to |
| 825 | \texttt{(LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4))}. Observe that |
| 826 | the CDR comes before the CAR, and symbols are interned: |
| 827 | \texttt{'foo} is compiled to \texttt{(SYMBOL 3)} and \texttt{'5} is |
| 828 | compiled to \texttt{(SYMBOL 4)} |
| 829 | \item The Lisp expression \texttt{(if t '(2 3) 'x)} is compiled to |
| 830 | \texttt{(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0) (SYMBOL 3)) |
| 831 | (SYMBOL 4))) (SYMBOL 2))}. Observe that \texttt{t} is always |
| 832 | compiled to \texttt{(SYMBOL 2)}, as mentioned in the section about |
| 833 | the memory format. |
| 834 | \end{itemize} |
| 835 | |
| 836 | \subsection{Assembler} |
| 837 | The assembler takes in an S-expression in the intermediate form, lays |
| 838 | out the words in memory, and produces a memory dump that can be |
| 839 | supplied to the processor. |
| 840 | |
| 841 | The core of the assembler is a function called \texttt{process} that |
| 842 | takes an expression in intermediate form and an optional location in |
| 843 | memory. This function lays out the expression's subexpressions in |
| 844 | memory and then lays out the expression itself at the given location |
| 845 | (or at the end of the memory if there is no location given). |
| 846 | |
| 847 | The assembler begins by initializing the first 7 elements of the |
| 848 | memory with the values needed by the processor. After initialization, |
| 849 | the \texttt{process} function is called on the input expression with |
| 850 | no location. |
| 851 | |
| 852 | Here is how the \texttt{process} function lays out the expression |
| 853 | $expr$ at location $l$ (where $l$ can be a given memory location or |
| 854 | simply ``the first free location in memory''): |
| 855 | |
| 856 | \begin{itemize} |
| 857 | \item If $expr$ is \texttt{(type value)}, then it is directly laid out |
| 858 | at $l$ |
| 859 | \item If $expr$ is \texttt{(type ptr)}, then \texttt{process} is |
| 860 | called recursively to lay \texttt{ptr} out at the first free |
| 861 | location in memory (say this is location $loc$). Then \texttt{(type |
| 862 | $loc$)} is laid out at $l$. |
| 863 | \item If $expr$ is \texttt{(type ptr1 ptr2)}, then two consecutive |
| 864 | free locations in memory $loc$ and $loc+1$ are reserved, |
| 865 | \texttt{process} is recursively called twice to lay \texttt{ptr1} at |
| 866 | $loc$ and \texttt{ptr2} at $loc+1$, and finally |
| 867 | \texttt{(type $loc$)} is laid out at $l$. |
| 868 | \end{itemize} |
| 869 | |
| 870 | Once again, examples of assembler operations can be found in the |
| 871 | assembler tests. Two of them: |
| 872 | |
| 873 | \begin{itemize} |
| 874 | \item \texttt{(quoted (symbol 5))} is assembled to \texttt{00 08 00 00 |
| 875 | 00 00 20 00 20 00 00 08 E0 07 00 00 20 05}. Observe that each word |
| 876 | is made of two bytes, the first word (\texttt{00 08}) indicating the |
| 877 | length (in words) of the program. The first 4 words are always |
| 878 | fixed: they represent in order the CDR of NIL, the CAR of NIL, the |
| 879 | CDR of T, and the CAR of T. The 7th word is also a constant, 0, |
| 880 | because that is where the processor puts the result. The 5th word is |
| 881 | the first free word in memory (which is equal to the length in words |
| 882 | of the program), and the 6th word (\texttt{E0 07}) is the expression |
| 883 | to be evaluated. In this case, it has type quote and points to |
| 884 | memory location 7 (the 8th word, because pointers are 0-indexed). At |
| 885 | memory location 7 we have the value \texttt{20 05} which has type |
| 886 | symbol and value 5. |
| 887 | \item \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} is |
| 888 | assembled to \texttt{00 0C 00 00 00 00 20 00 20 00 00 0C C0 07 00 00 |
| 889 | 00 09 20 05 E0 00 80 0B 5F FE}. Here we want to evaulate the |
| 890 | expression \texttt{C0 07} (type ``function call'', address 7). This |
| 891 | points to a cons-cell, whose car (at position 8) is \texttt{20 05} |
| 892 | (that is, \texttt{(number 5)}) and whose cdr (at position 7) is |
| 893 | \texttt{00 09} (type ``more'' and address 9). The ``more'' points to |
| 894 | another cons-cell, whose cdr (at position 9) is \texttt{E0 00} (that |
| 895 | is, \texttt{(funcall 0)}), and whose car is \texttt{80 0B}: a |
| 896 | procedure whose body is at position 11, namely \texttt{5F FE} (that |
| 897 | is, \texttt{(var -2)}). |
| 898 | |
| 899 | So the expression is a call to the primitive \texttt{funcall} with |
| 900 | arguments \texttt{(lambda l (x) x)} and \texttt{5}. The original |
| 901 | program was probably \texttt{((lambda l (x) x) 5)}. Note that this |
| 902 | program was compiled with an old version of the compiler that did |
| 903 | not intern symbols, which explains why \texttt{5} was compiled to |
| 904 | \texttt{(number 5)} and not \texttt{(symbol 3)}. |
| 905 | \end{itemize} |
| 906 | |
| 907 | \subsection{REPL} |
| 908 | The REPL ties everything together: it reads a program from the user, |
| 909 | compiles it, assembles it, sends the assembled program to the |
| 910 | processor, reads the memory dump from the processor, and finally |
| 911 | prints the result to the screen. |
| 912 | |
| 913 | The REPL includes an S-expression pretty-printer that operates on |
| 914 | memory dumps. As symbols are interned by the compiler, when the |
| 915 | pretty-printer hits a symbol it only knows its index, and not the |
| 916 | original name. To print the original name the pretty-printer searches |
| 917 | the symbol map built by the compiler and pulls the name from there. |
| 918 | |
| 919 | The pretty-printer takes a chunk of memory and index, and prints the |
| 920 | expression at that index. For example, if the memory chunk is |
| 921 | \texttt{00 00 00 00 20 00 20 00 00 09 00 07 00 07 20 02 00 00 60 00} |
| 922 | and the index is 6, then the pretty printer will try to print |
| 923 | \texttt{00 07}, which is a list with car \texttt{00 00} and cdr |
| 924 | \texttt{20 02}, and the pretty printer prints \texttt{(nil . t)}. |
| 925 | |
| 926 | An example REPL session is: |
| 927 | |
| 928 | \begin{lstlisting} |
| 929 | YULE REPL |
| 930 | |
| 931 | * (cons 1 2) |
| 932 | (1 . 2) |
| 933 | * (car '(4 5 6)) |
| 934 | 4 |
| 935 | * (cdr '(t t t)) |
| 936 | (T T) |
| 937 | * ((lambda id (x) x) 5) |
| 938 | 5 |
| 939 | * (lambda first (a b) a) |
| 940 | <CLOSURE> |
| 941 | * ((lambda first (a b) a) 7 8) |
| 942 | 7 |
| 943 | * (atom 6) |
| 944 | T |
| 945 | \end{lstlisting} |
| 946 | |
| 947 | \chapter{Evaluation and Testing} |
| 948 | To make development easier, ensure everything works correctly, and |
| 949 | prevent regressions there are unit tests for the compiler and |
| 950 | assembler, and functional tests for the entire project. |
| 951 | |
| 952 | Besides these automated tests, manual evaluation (in the beginning, |
| 953 | precompiling simple expressions and supplying them to the processor as |
| 954 | the initial value of the memory; later on, manually typing expressions |
| 955 | in the REPL and looking at the output). When the processor was not |
| 956 | operating as expected, the GTKWave wave simulator was used to assist |
| 957 | with debugging. This piece of software takes in a recorded simulation |
| 958 | of the circuits and displays the values of all signals inside the |
| 959 | circuit against time. This allows the designer to look at every clock |
| 960 | cycle in turn and discover where the error occurs. |
| 961 | |
| 962 | The process of debugging the processor had the side effect of finding |
| 963 | inefficiencies in the original design. While \cite{lambda} mentions |
| 964 | some limitations of the processors, one important limitation that was |
| 965 | only discovered during debugging is that for every operation that the |
| 966 | EVAL performs, the result of that operation will be written to new |
| 967 | locations in memory. This includes ``read-only'' operations such as |
| 968 | CAR and CDR. If the programmer requests the CAR of a list, a copy of |
| 969 | the first element will be taken and a pointer to it will be returned. |
| 970 | Since YULE only has 4096 words of memory, this severely restricts the |
| 971 | complexity of programs that can be evaluated on it. |
| 972 | |
| 973 | Another avenue to ensure the processor works as it should is formal |
| 974 | verification using model checking tools. Formal verification is |
| 975 | commonly employed as part of circuit design evaluation because |
| 976 | mistakes are more difficult to identify, and if found after the chip |
| 977 | has been built they are very costly to fix (often the chips that were |
| 978 | built with the bug need to be thrown away). However, building models |
| 979 | is a very involved process, and devoting effort to formal verification |
| 980 | of this project would not have allowed the implementation of several |
| 981 | features. Due to time constraints and knowing that no physical YULE |
| 982 | chips are built (the project is so far only synthesised to a FPGA), |
| 983 | testing and manual evaluation was considered sufficient for YULE. |
| 984 | |
| 985 | \section{Unit testing} |
| 986 | The software was tested with 48 unit tests written in Perl using the |
| 987 | \texttt{Test::More} framework. |
| 988 | |
| 989 | The assembler takes in compiled Lisp code and outputs a memory dump in |
| 990 | one of two formats (binary or Verilog code). Its tests take a |
| 991 | configuration for the assembler, an input, two expected outputs, and |
| 992 | (for documentation purposes) the source Lisp code. An example test: |
| 993 | |
| 994 | \begin{lstlisting} |
| 995 | $expbin = |
| 996 | '00 0C 00 00 00 00 20 00 20 00 00 0C C0 07' |
| 997 | . ' 00 00 00 09 20 05 E0 00 80 0B 5F FE'; |
| 998 | run_test {addr_bits => 13}, |
| 999 | '(call (more (funcall 0) (proc (var -2))) (number 5))', |
| 1000 | <<'EOF', $expbin, '((LAMBDA ID (X) X) 5)'; |
| 1001 | mem[ 0] <= 0; // (cdr part of NIL) |
| 1002 | mem[ 1] <= 0; // (car part of NIL) |
| 1003 | mem[ 2] <= 16'b0010000000000000; // (cdr part of T) |
| 1004 | mem[ 3] <= 16'b0010000000000000; // (car part of T) |
| 1005 | mem[ 4] <= 16'd12; // (free storage pointer) |
| 1006 | mem[ 5] <= 16'b1100000000000111; // CALL 7 |
| 1007 | mem[ 6] <= 0; // (result of computation) |
| 1008 | mem[ 7] <= 16'b0000000000001001; // MORE 9 |
| 1009 | mem[ 8] <= 16'b0010000000000101; // NUMBER 5 |
| 1010 | mem[ 9] <= 16'b1110000000000000; // FUNCALL 0 |
| 1011 | mem[10] <= 16'b1000000000001011; // PROC 11 |
| 1012 | mem[11] <= 16'b0101111111111110; // VAR -2 |
| 1013 | EOF |
| 1014 | \end{lstlisting} |
| 1015 | |
| 1016 | Here the compiled code \texttt{(call (more (funcall 0) (proc (var -2))) (number 5))} |
| 1017 | is given to the assembler, and its outputs are compared to the binary |
| 1018 | string on the first line and the block of Verilog code. |
| 1019 | |
| 1020 | The tests do not all contain valid assembler inputs. There are a |
| 1021 | series of tests that give invalid inputs to the assembler and expect |
| 1022 | the assembler to raise an exception matching a pattern. Example tests: |
| 1023 | |
| 1024 | \begin{lstlisting} |
| 1025 | expect_error_like { run_test {}, '((type is a list) 5)'} |
| 1026 | 'Type of toplevel is not atom'; |
| 1027 | expect_error_like { run_test {}, '(badtype 5)'} |
| 1028 | 'No such type'; |
| 1029 | expect_error_like { run_test {}, '(5 700000)'} |
| 1030 | 'Addr too large'; |
| 1031 | \end{lstlisting} |
| 1032 | |
| 1033 | On the compiler side, there are 4 tests for the quoting function, |
| 1034 | but the brunt of the tests are pairs of a compiler input (i.e. Lisp |
| 1035 | code) and a compiler output (i.e. a sexp that the assembler |
| 1036 | understands). For each pair the compiler is run on the first element |
| 1037 | and its output is compared with the second. Example such tests: |
| 1038 | |
| 1039 | \begin{lstlisting} |
| 1040 | is_toplevel '(quote 5)', '(SYMBOL 3)'; |
| 1041 | is_toplevel '(reverse-list \'a \'a \'b)', |
| 1042 | '(CALL (MORE (MORE (REVERSE-LIST 0) (SYMBOL 4)) (SYMBOL 3)) (SYMBOL 3))'; |
| 1043 | is_toplevel '(if t \'(2 3) \'x)', |
| 1044 | '(IF (LIST (SYMBOL 5) (LIST (LIST (LIST 0)' |
| 1045 | . ' (SYMBOL 3)) (SYMBOL 4))) (SYMBOL 2))'; |
| 1046 | \end{lstlisting} |
| 1047 | |
| 1048 | As the compiler code includes a function to pretty-print |
| 1049 | S-expressions, there are a few round-trip tests on this function. |
| 1050 | Given a string, it is parsed using the \texttt{Data::SExpression} |
| 1051 | parsing library then it is pretty-printed. The result should match the |
| 1052 | original string. |
| 1053 | |
| 1054 | There are also tests for handling of invalid input, like in the assembler. These include: |
| 1055 | |
| 1056 | \begin{lstlisting} |
| 1057 | expect_error_like { is_toplevel 'x' } 'Variable x not in environment'; |
| 1058 | expect_error_like { is_toplevel '(car)' } 'Cannot call primitive car with no arguments'; |
| 1059 | \end{lstlisting} |
| 1060 | |
| 1061 | A test coverage tool, \texttt{Devel::Cover}, was ran on the compiler |
| 1062 | and assembler to find statements and branches that were not covered by |
| 1063 | existing unit tests, and new tests were written to cover them. Right |
| 1064 | now \texttt{Devel::Cover} reports 100\% test coverage of the compiler |
| 1065 | and assembler. |
| 1066 | |
| 1067 | \begin{figure} |
| 1068 | \centering\includegraphics[height=8cm]{devel-cover.png} |
| 1069 | \caption{Screenshot of \texttt{Devel::Cover} results, showing 100\% |
| 1070 | test coverage} |
| 1071 | \end{figure} |
| 1072 | |
| 1073 | \section{Functional testing} |
| 1074 | Besides unit tests, the entire system (hardware and software) has |
| 1075 | undergone functional testing. This means that a set of 21 Lisp |
| 1076 | expressions has been prepared, alongside their correct values computed |
| 1077 | using a software Lisp implementation. These values are given to the |
| 1078 | YULE REPL and the test succeeds if the output equals the known correct |
| 1079 | value of the expression. |
| 1080 | |
| 1081 | Here are some example tests: |
| 1082 | \begin{lstlisting} |
| 1083 | '(1 2 3) |
| 1084 | 5 |
| 1085 | ((lambda id (x) x) 5) |
| 1086 | ((lambda id (x) x) '(1 2 3)) |
| 1087 | (car '(2 3)) |
| 1088 | (car (cdr '(4 5 6))) |
| 1089 | ((lambda snd (x y) y) 'first 'second) |
| 1090 | |
| 1091 | (cons '(1 2) '(7)) |
| 1092 | (car (cons '(1 2) '(7))) |
| 1093 | (cdr (cons '(1 2) '(7))) |
| 1094 | |
| 1095 | ((lambda rev (xs) ((lambda reverse-acc (xs acc) (if xs (reverse-acc (cdr xs) (cons (car xs) acc)) acc)) xs '())) '(4 5 6 7)) |
| 1096 | \end{lstlisting} |
| 1097 | |
| 1098 | This test suite tries to cover most kinds of valid expressions. It |
| 1099 | includes constants of all kinds, conditionals, lambda abstractions, |
| 1100 | calls to all primitives, calls to user-defined functions, recursive |
| 1101 | calls. |
| 1102 | |
| 1103 | \section{Timing and performance} |
| 1104 | There are three parts to the performance of YULE: |
| 1105 | |
| 1106 | \begin{enumerate} |
| 1107 | \item The efficiency of the software. This refers to the time taken to |
| 1108 | compile and assemble input programs, which is negligible. |
| 1109 | |
| 1110 | \item The I/O performance, which refers to the time needed to transfer |
| 1111 | the program from the computer to the processor's memory, plus the |
| 1112 | time needed to transfer the memory dump from the memory back to the |
| 1113 | computer. |
| 1114 | |
| 1115 | \item The speed of the processor itself, that is the time/number of |
| 1116 | cycles that the processor needs to evaluate a given expression. |
| 1117 | \end{enumerate} |
| 1118 | |
| 1119 | The GC runs at 48MHz, while the EVAL only runs when the GC allows it |
| 1120 | to which is at most half of the time. |
| 1121 | |
| 1122 | The UART runs at 4000000 baud, so a bit is sent every 12 cycles of the |
| 1123 | 48MHz clock. A byte is made of 11 bits: a start bit, 8 data bits, and |
| 1124 | one stop bit. A word is two bytes, so 22 bits. Therefore 264 cycles |
| 1125 | are needed to transmit one word. |
| 1126 | |
| 1127 | Since the WRITER just sends a full memory dump, the I/O performance is |
| 1128 | proportional not to \texttt{program\_len + result\_len} but to |
| 1129 | \texttt{program\_len + result\_len + intermediate\_results\_len}. As |
| 1130 | evaluation of any expression involves the creation of at least one |
| 1131 | value (the result), for every expression or subexpression evaluated at |
| 1132 | least 264 cycles will be needed to output that result. The EVAL takes |
| 1133 | significantly less than 264 cycles to evaluate a expression (not |
| 1134 | counting its subexpressions), in one test each expression took between |
| 1135 | 30 and 90 cycles to evaluate. The processor performance is therefore |
| 1136 | dominated by I/O. |
| 1137 | |
| 1138 | As an example, the program \texttt{((lambda id (x) x) 5)} is 12 bytes |
| 1139 | long, and takes 258 cycles to evaluate. The memory dump is 54 bytes |
| 1140 | long. Therefore of the roughly 17700 cycles needed from beginning to |
| 1141 | end, about 98.5\% are used to do I/O. |
| 1142 | |
| 1143 | \section{A worked example} |
| 1144 | \label{sec:example} |
| 1145 | Here is an example of what happens when a expression is typed into the |
| 1146 | REPL: |
| 1147 | |
| 1148 | \begin{enumerate} |
| 1149 | \item A user types into the REPL the string |
| 1150 | |
| 1151 | \begin{lstlisting} |
| 1152 | (car '(2 3 4)) |
| 1153 | \end{lstlisting} |
| 1154 | |
| 1155 | This is a call to the CAR primitive, with the list \texttt{(2 3 4)} |
| 1156 | as an argument. We expect the result to be \texttt{2}, the first |
| 1157 | element of the list. |
| 1158 | |
| 1159 | \item This string is given to the compiler, producing the sexp |
| 1160 | |
| 1161 | \begin{lstlisting} |
| 1162 | (CALL (CAR 0) (LIST (LIST (LIST (LIST 0) (SYMBOL 3)) (SYMBOL 4)) (SYMBOL 5))) |
| 1163 | \end{lstlisting} |
| 1164 | |
| 1165 | Observe that in this representation the CDR comes before the CAR. So |
| 1166 | for example ``(LIST (LIST 0) (SYMBOL 3))'' represents a list whose |
| 1167 | CAR is ``(SYMBOL 3)'' and CDR is ``(LIST 0)'' (that is, nil). This |
| 1168 | is a list with one element, ``(SYMBOL 3)''. The sexp shown here |
| 1169 | represents a call to the CAR primitive with one argument. The list |
| 1170 | has three elements, which are in order \texttt{(SYMBOL 5), (SYMBOL |
| 1171 | 4), (SYMBOL 3)}. |
| 1172 | |
| 1173 | In this stage the compiler maps each number to a symbol, in this |
| 1174 | case \texttt{(SYMBOL 5)} is 2, \texttt{(SYMBOL 4)} is 3, and |
| 1175 | \texttt{(SYMBOL 3)} is 2. This mapping is remembered so it can be |
| 1176 | used later by the pretty printer. |
| 1177 | |
| 1178 | \item This sexp is given to the assembler, producing the 16 words long |
| 1179 | binary string (here displayed in hex, with a space after each 16-bit |
| 1180 | word) |
| 1181 | |
| 1182 | \begin{lstlisting} |
| 1183 | 000F 0000 0000 2000 2000 000F C007 0000 2000 0009 000B 2005 000D 2004 0000 2003 |
| 1184 | \end{lstlisting} |
| 1185 | |
| 1186 | The first word, \texttt{000F}, indicates the length of the program |
| 1187 | in |
| 1188 | words (15 words, this does not include this first word).\\ |
| 1189 | The next four words, \texttt{0000 0000 2000 2000}, represent the CDR |
| 1190 | and CAR of NIL and T respectively. These words are always |
| 1191 | constant.\\ |
| 1192 | The next word, \texttt{000F}, represents the first unused position |
| 1193 | in memory. This is always equal to the first word transmitted.\\ |
| 1194 | The next word, \texttt{C007}, represents the expression to be |
| 1195 | evaluated.\\ |
| 1196 | The next word, \texttt{0000}, is ignored as this is where the |
| 1197 | processor will place the result of the evaluation.\\ |
| 1198 | The rest of the words are subexpressions or parts of subexpressions |
| 1199 | of the expression being evaluated.\\ |
| 1200 | |
| 1201 | The expression that is evaluated is \texttt{C007}, which is a CALL |
| 1202 | whose CDR is at position 7 (\texttt{2000}, meaning \texttt{(CAR 0)}) |
| 1203 | and whose CAR is at |
| 1204 | position 8 (\texttt{0009}).\\ |
| 1205 | \texttt{0009} is a list with CAR at 10 (\texttt{2005}, meaning |
| 1206 | \texttt{(SYMBOL 5)}) and CDR at 9 (\texttt{000B}).\\ |
| 1207 | \texttt{000B} is a list with CAR at 12 (\texttt{2004}, meaning |
| 1208 | \texttt{(SYMBOL 4)}) and CDR at 11 (\texttt{000D}).\\ |
| 1209 | \texttt{000D} is a list with CAR at 14 (\texttt{2003}, meaning |
| 1210 | \texttt{(SYMBOL 3)}) and CDR at 13 (\texttt{0000}, meaning NIL). |
| 1211 | |
| 1212 | \item This binary string is sent via UART to the FPGA's READER. The |
| 1213 | READER writes the string (except the first word) into memory, |
| 1214 | overwriting the previous contents of the memory. It then reports to |
| 1215 | the controller that it is done, and the controller advances to |
| 1216 | \texttt{STATE\_RUN}. |
| 1217 | |
| 1218 | \item The EVAL and GC evaluate the program in memory, and then place |
| 1219 | the result back into memory. The EVAL reports to the controller that |
| 1220 | it is done, and the controller advances to \texttt{STATE\_WRITE}. |
| 1221 | |
| 1222 | \item The WRITER sends this memory dump via UART back to the REPL |
| 1223 | |
| 1224 | \begin{lstlisting} |
| 1225 | 000F C007 2005 2000 0009 000B 2005 000D 2004 0000 2003 0000 4004 2000 0010 0000 0012 0000 0000 0009 |
| 1226 | \end{lstlisting} |
| 1227 | |
| 1228 | Then the WRITER reports to the controller that it is done, and the |
| 1229 | controller advances to \texttt{STATE\_READ}. |
| 1230 | |
| 1231 | Observe that this memory dump is similar to the assembler output. |
| 1232 | The first two words are the 6th and 7th words in the assembler |
| 1233 | output, the next word is the result of the evaluation, then the next |
| 1234 | 8 words are words 9th through 16th in the assembler output. The |
| 1235 | following words are intermediate results computed by the processor. |
| 1236 | |
| 1237 | \item The REPL's pretty printer interprets this string starting with |
| 1238 | \texttt{2005}, which means \texttt{(SYMBOL 5)}. The pretty printer |
| 1239 | looks in the compiler's mapping and finds that the 5th symbol was |
| 1240 | 2, and so the REPL prints |
| 1241 | |
| 1242 | \begin{lstlisting} |
| 1243 | 2 |
| 1244 | * |
| 1245 | \end{lstlisting} |
| 1246 | |
| 1247 | The evaluation is complete, and a new prompt is printed so that the |
| 1248 | user can input another expression. |
| 1249 | \end{enumerate} |
| 1250 | |
| 1251 | \begin{figure} |
| 1252 | \centering\includegraphics[height=7cm]{repl.png} |
| 1253 | \caption{What the user types and sees in this example} |
| 1254 | \end{figure} |
| 1255 | |
| 1256 | \chapter{Conclusions} |
| 1257 | A Lisp machine processor has been built using the original design and |
| 1258 | techniques presented in \cite{lambda}. It has been extended with input |
| 1259 | and output capabilities, and using a modern FPGA it has been |
| 1260 | synthesised on a physical device slightly larger than a USB flash |
| 1261 | drive. This device can be used as a standalone Lisp evaluator, or the |
| 1262 | processor inside it can be included in a more complicated circuit. |
| 1263 | |
| 1264 | Software has been written to help interact with the processor. This |
| 1265 | includes a compiler, an assembler, and a REPL that together make it |
| 1266 | possible for an end-user unfamiliar with hardware design or the |
| 1267 | processor's internals to submit Lisp expressions to the processor and |
| 1268 | get back their values. The software also allows programmers to convert |
| 1269 | Lisp source code to the binary format understood by the processor, so |
| 1270 | that they may later send the binaries to the processor for evaluation. |
| 1271 | |
| 1272 | The processor and software have been tested to ensure they work |
| 1273 | correctly. This has uncovered bugs and inefficiencies in the original |
| 1274 | paper, some of which could not be gleaned from simply reading the |
| 1275 | paper. |
| 1276 | |
| 1277 | The source files for the hardware and software, and also the automated |
| 1278 | tests have been released under an open-source license to make it easy |
| 1279 | for future circuit designers and programmers to use and extend this |
| 1280 | project. They can be found at \url{https://git.ieval.ro/?p=yule.git}. |
| 1281 | |
| 1282 | The original paper \cite{lambda} is one of the influential Lambda |
| 1283 | Papers that defined the Scheme programming language and is an |
| 1284 | important historical artifact for understanding how Lisp machine |
| 1285 | processors can be implemented and more generally how tagged |
| 1286 | architectures work. But all details of the processor cannot be |
| 1287 | understood from simply reading the paper, and most readers would not |
| 1288 | follow by attempting to implement what is described there. This |
| 1289 | projects brings a tangible implementation into the open, which shows |
| 1290 | the limitations of the design to people interested in this design and |
| 1291 | makes it possible to solve this limitations and extend the processor |
| 1292 | beyond its original scope. |
| 1293 | |
| 1294 | \section{Reflections} |
| 1295 | Now that the project is finished, it is worth revisiting ideas I had |
| 1296 | in the beginning that were not ultimately useful. |
| 1297 | |
| 1298 | The original concept had the processor run on a typical development |
| 1299 | board, with the software running on a Raspberry PI. The board and the |
| 1300 | Raspberry PI would be connected through jumper wires, and the UART |
| 1301 | circuitry on the Raspberry PI would be used by the software to talk to |
| 1302 | the processor. |
| 1303 | |
| 1304 | This idea would have produced a much harder to use YULE. Not only |
| 1305 | would the user have to ssh into a Raspberry PI to do anything useful, |
| 1306 | but also the equipment would take more space (both boards are larger |
| 1307 | than the iCEstick). Once I tried the iCEstick and found that |
| 1308 | everything can be done just as well with it, it replaced the two |
| 1309 | boards and made the project easier to use (albeit slightly harder to |
| 1310 | debug, as the iCEstick only has 5 LEDs). |
| 1311 | |
| 1312 | Some code had to be rewritten, sometimes multiple times. The compiler |
| 1313 | was originally written in Common Lisp, as it is the best suited |
| 1314 | language for manipulation of S-expressions, especially Lisp code. This |
| 1315 | meant a working project was obtained more quickly, but then the |
| 1316 | compiler had to be rewritten in another language; if the user has a |
| 1317 | software Lisp implementation already, why would they need YULE? I |
| 1318 | chose Perl as the reimplementation language, as the assembler was |
| 1319 | already written in Perl and it is widely-available, most systems |
| 1320 | coming with a Perl interpreter out of the box. |
| 1321 | |
| 1322 | Throughout development of the hardware part, timing problems were a |
| 1323 | mainstay. The first ones related to the interactions between the |
| 1324 | processor and the memory (the memory was clocked, but the processor |
| 1325 | expected a combinational RAM). What fixed this problem was to run the |
| 1326 | memory at twice the speed of the rest of the processor. At some point, |
| 1327 | a decision was taken to run the GC and EVAL on different clocks (both |
| 1328 | ran at one quarter the speed of the master clock, but one of the |
| 1329 | clocks was delayed by half a cycle. The positive edge on one clock |
| 1330 | would happen midway between a positive edge and a negative edge on the |
| 1331 | other one). This proved to be unwise, as sometimes values sent by one |
| 1332 | module to the other would be cleared before the destination module |
| 1333 | managed to read them. |
| 1334 | |
| 1335 | GTKWave proved to be very useful in debugging these timing issues, as |
| 1336 | the exact point where the issue began was easy to identify and I could |
| 1337 | see why the issue was happening. A workaround for this was to add a |
| 1338 | latch on each side of the common E bus that remembered the value sent |
| 1339 | long enough for the receiving module to see it. |
| 1340 | |
| 1341 | All timing problems were fixed more cleanly in the end when an attempt |
| 1342 | to simplify the clocks was made. First GC and EVAL were changed to use |
| 1343 | the same clock, which removed the need for the latches, and then the |
| 1344 | memory was changed to use the negative edge of the clock instead of |
| 1345 | the positive edge, which meant no more clock division code had to be |
| 1346 | used. |
| 1347 | |
| 1348 | \section{Future work} |
| 1349 | There are several ways to extend this project. |
| 1350 | |
| 1351 | One idea is to fix the noted limitations: the ATOM primitive considers |
| 1352 | NIL to not be an atom, and there is no EQ primitive that compares |
| 1353 | symbols/other objects. The latter problem means that all symbols are |
| 1354 | indistinguishable (note: NIL can still be distinguished from T and |
| 1355 | other symbols, but NIL is not a symbol). To introduce equality a new |
| 1356 | signal needs to be added to the EVAL that compares (for example) the |
| 1357 | value on the E bus with the value of a fixed register, possibly a |
| 1358 | newly added one. Then (EQ A B) would store A in the register, load B |
| 1359 | on the E bus and compare them. |
| 1360 | |
| 1361 | Once equality is implemented, it becomes possible to implement the |
| 1362 | EVAL function that evaluates an arbitrary Lisp expression. |
| 1363 | |
| 1364 | Two more involved instruction set improvements are adding arithmetic |
| 1365 | operations, and input/output. Without arithmetic primitives one has to |
| 1366 | rely on Church numerals, which encodes numbers as lambda abstractions |
| 1367 | and is very inefficient. Input/output primitives would allow programs |
| 1368 | to receive data at runtime over the UART and send data back before |
| 1369 | evaluation is finished. This enables long-running useful programs, for |
| 1370 | example with input/output and equality a REPL can be written (which |
| 1371 | relies on the EVAl function mentioned above). |
| 1372 | |
| 1373 | On the software side, the assembler can support optimizations and then |
| 1374 | the REPL can be extended with a core image. The main optimization |
| 1375 | needed in the assembler is common subexpression elimination. The |
| 1376 | processor never overwrites a value in memory with another value (other |
| 1377 | than overwriting \texttt{mem[6]} with the result of the evaluation), |
| 1378 | so it is safe to share any identical subexpressions. For example, if |
| 1379 | the user inputs the program \texttt{(cons (car '(1 2 3 4)) (cdr '(0 2 |
| 1380 | 3 4)))}, then when the assembler should share the tails of the two |
| 1381 | lists, meaning the second list should compile to \texttt{(LIST (LIST |
| 1382 | X) (SYMBOL Y))} where X is a pointer to where the second element of |
| 1383 | the first list is stored. This makes long programs significantly |
| 1384 | shorter. Note that no compiler changes are required to implement this, |
| 1385 | as the assembler can just remember all expressions and subexpressions |
| 1386 | it has assembler so far including their locations, and whenever a new |
| 1387 | expression needs to be assembler it can be replaced with a pointer to |
| 1388 | where the same expression has been placed before. |
| 1389 | |
| 1390 | Once this optimization is implemented, the REPL can gain core image |
| 1391 | support. Suppose the user enters the expression \texttt{expr}. Instead |
| 1392 | of sending \texttt{assemble(compile(expr))} to the processor, the REPL |
| 1393 | can remember the last expression sent (or the last memory dump |
| 1394 | received), then append \texttt{assemble(compile(expr))} to that |
| 1395 | previous expression. This way \texttt{expr} can refer to previously |
| 1396 | defined expressions, as long as the REPL or compiler has a way to name |
| 1397 | previously supplied instructions, for example via a \texttt{define} |
| 1398 | special form, via the standard REPL variables (\texttt{+, ++, +++}), |
| 1399 | or by sequentially numbering all expressions submitted by the user. |
| 1400 | |
| 1401 | Returning to hardware, a potential future project is to add actual |
| 1402 | hardware garbage collection to the GC component. At the moment the |
| 1403 | design does no garbage collection, the processor only has 4096 words |
| 1404 | of RAM available, and evaluation is quite memory-inefficient. Adding |
| 1405 | garbage collection would make more complicated programs possible |
| 1406 | despite the limited RAM. One way to implement this is to add a small |
| 1407 | microprocessor to the design which is preprogrammed with a garbage |
| 1408 | collection algorithm (simple choices are a stop-and-copy collector or |
| 1409 | Cheney's algorithm). When the free space pointer gets too high, the |
| 1410 | EVAL and GC are stopped, and the new microprocessor is started up so |
| 1411 | it can perform garbage collection, compact the memory, and update the |
| 1412 | free space pointer. The EVAL/GC registers would have to be updated as |
| 1413 | well. Then the microprocessor can be stopped, EVAL and GC be started |
| 1414 | up again and evaluation can continue. Note that a non-moving collector |
| 1415 | is not sufficient: the GC allocates new words only at the very end of |
| 1416 | the memory, so it cannot benefit from free ``gaps'' in the middle of |
| 1417 | the memory. |
| 1418 | |
| 1419 | \begin{thebibliography}{99} |
| 1420 | \bibitem{lambda} Design of LISP-Based Processors (1979), available at |
| 1421 | \url{http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-514.pdf} |
| 1422 | \bibitem{osdvu} \url{https://opencores.org/project/osdvu} |
| 1423 | \bibitem{gtkwave} \url{http://gtkwave.sourceforge.net/} |
| 1424 | \bibitem{icestorm} \url{http://www.clifford.at/icestorm/} |
| 1425 | \end{thebibliography} |
| 1426 | \end{document} |