All Levels
- The Nand Game - Hardware Levels
- The Nand Game - Software Levels
- The Nand Game - Optional Levels
Assembler Quick Reference
Syntax: destination = calculation ; jump-condition
. Destination and jump-condition are optional.
D and A are the two registers. *A indicate the value in RAM at the address contained in A.
Examples:
D = D + 1
: Calculate D + 1 and store the result in D.D - 1 ; JGE
: Calculate D - 1. Jump if the result is greater than 0. (Result is not stored.)A = 1234
: Store the number 1234 in the A register.# Hello
: Comment.
Calculations
D + A
D - A
orA - D
D & A
(Bitwise and)D | A
(Bitwise or)A + 1
orD + 1
A - 1
orD - 1
~A
or~D
(Bitwise inversion)1
or0
or-1
In all cases can *A be used instead of A.
Destinations
Any combination of D, A and *A can be used as destination. Multiple destinations are comma separated, e.g. D, A = 1
.
Jump conditions
Jump when result of calculations is:
JQE
equal to 0JNE
not equal to 0JGT
greater than 0JGE
greater than or equal to 0JLT
less than 0JLE
less than or equal to 0JMP
unconditional jump (regardless of the calculation result)
Number instructions
A can be directly assigned a number. Example: A = 47
.
Numbers in hexadecimal is prefixed with 0x
, e.g.: A = 0x7FFF
.
Numbers in binary is prefixed with 0b
, e.g.: A = 0b010111
. Underscores can be used to separate digit group, e.g. A = 0b_0101_1100
.
Comments
Lines starting with #
are ignored by the assembler. They can be used for comments and documentation.
Labels
The keyword LABEL followed by a name makes the name represent the address of the following instruction. This address can be assigned to A before a jump, e.g. A = LOOP
.
Defines
The keyword DEFINE followed by a name and a number will cause the name to be replaced with the number when it occurs in other instructions. For example:
1 | DEFINE foo 0x7FFF |
Is equivalent to: A = 0x7FFF
.
Multiple targes
It is possible to assign a result to multiple register.
E.g. D, A = D - *A
. The result of calculation will be written to both A and D.
Any combination of A, D, and *A can be specified as destination.
Software Level: Low level
Machine code
Write a program with four instructions:
- Set the D register to 0
- Set the A register to 2
- Add 1 to the D register
- Jump unconditionally
就是在 Hardware 最后 Computer 那关的 ROM 里填入指令。
The word at the PC address in the program memory is the I input to the control unit.
需要结合 Control Unit、 ALU Instruction、ALU 和 Condition 看指令每一位的取值。
Note: Inside ALU instruction:
alu(X=D, Y=A/*A)
.
I: Bit | I: Hex | Group | Flag | Note |
---|---|---|---|---|
15 | 0x8000 | - | ci | 0: data (to A); 1: ALU |
12 | 0x1000 | - | * | 1: use *A instead of A |
10 | 0x400 | operation | u | |
9 | 0x200 | operation | op1 | |
8 | 0x100 | operation | op0 | |
7 | 0x80 | operation | zx | 1: D = 0 |
6 | 0x40 | operation | sw | 1: swap D and A |
5 | 0x20 | target | a | 1: a |
4 | 0x10 | target | d | 1: d |
3 | 0x8 | target | *a | 1: *a |
2 | 0x4 | jump | lt | |
1 | 0x2 | jump | eq | |
0 | 0x1 | jump | gt |
u | op1 | op0 | Output |
---|---|---|---|
0 | 0 | 0 | D and A |
0 | 0 | 1 | D or A |
0 | 1 | 0 | D xor A |
0 | 1 | 1 | invert D |
1 | 0 | 0 | D + A |
1 | 1 | 0 | D - A |
1 | 0 | 1 | D + 1 |
1 | 1 | 1 | D - 1 |
lt | eq | gt | output 1 when |
---|---|---|---|
0 | 0 | 0 | Never |
0 | 0 | 1 | X > 0 |
0 | 1 | 0 | X = 0 |
0 | 1 | 1 | X ≥ 0 |
1 | 0 | 0 | X < 0 |
1 | 0 | 1 | X ≠ 0 |
1 | 1 | 0 | X ≤ 0 |
1 | 1 | 1 | Always |
Set the D register to 0
- 往寄存器 D 写数据,得用 alu instruction,
ci = 1
,target.d = 1
。 - 要写入 0,那么 alu 模块应该取
u = 0, op1 = 0, op0 = 0
即D and A
,同时zx = 1
即D = 0
(合起来就是0 and A = 0
。 - condition 模块取全零即输出恒 0。
So I = ci | d | zx = 0x8090
(i.e. D = 0
).
Set the A register to 2
- 往寄存器 A 写数据,设置
ci = 0
,其余位构成要写入的数字 2。
So I = 0x0002
(i.e. A = 2
).
Add 1 to the D register
- 需要使用 alu instruction,设置
ci = 1
,target.d = 1
。 - alu 模块取
u = 1, op1 = 0, op0 = 1
即D + 1
。 - condition 模块全零。
So I = ci | d | u | op0 = 0x8510
(i.e. D = D + 1
).
Jump unconditionally
- 需要使用 alu instruction,设置
ci = 1
。 - condition 模块取全一即输出恒 1。
- 注意当
j
(即 condition 的输出)为 1 时,PC 会被设置为寄存器 A 的值。本题中会跳到ROM[2]
。
- 注意当
So I = ci | lt | eq | gt = 0x8007
(i.e. JMP
).
All together
持续运行的效果相当于对寄存器 D 从 0 开始逐步递增。
Assembly Language
Programming a computer by directly setting instruction bits is quite time-consuming and error-prone.
Therefore we create a so-called assembler language, which is a text-based format using letters and symbols instead of bits to represent machine-code instructions.
An assembler translates the symbolic instructions into binary machine code.
An assembler instruction has three parts: The destination, the calculation and an (optional) jump-condition.
- The destination is the register(s) which the output of the operation is written to.
- The calculation is the ALU operation.
- The jump condition is the condition which will cause a jump.
Example instruction:
Destination
用空白代表 0,让 1 更突出。
Opcode | a | d | *a |
---|---|---|---|
(blank) | |||
A = | 1 | ||
D = | 1 | ||
*A = | 1 | ||
A, D = | 1 | 1 | |
D, *A = | 1 | 1 | |
A, D, *A = | 1 | 1 | 1 |
A, *A = | 1 | 1 |
Calculation
有些 operation 可以对应多种不同的 flags 组合。
Opcode | u | op1 | op0 | zx | sw | Note |
---|---|---|---|---|---|---|
D+A | 1 | |||||
D-A | 1 | 1 | ||||
A-D | 1 | 1 | 1 | |||
D+1 | 1 | 1 | ||||
A+1 | 1 | 1 | 1 | |||
D-1 | 1 | 1 | 1 | |||
A-1 | 1 | 1 | 1 | 1 | ||
-D | 1 | 1 | 1 | 1 | i.e. 0-D | |
-A | 1 | 1 | 1 | i.e. 0-A | ||
-1 | 1 | 1 | 1 | 1 | i.e. 0-1 | |
1 | 1 | 1 | 1 | i.e. 0+1 | ||
D | 1 | 1 | 1 | i.e. 0+D | ||
A | 1 | 1 | i.e. 0+A | |||
D&A | ||||||
D|A | 1 | |||||
~D | 1 | 1 | ||||
~A | 1 | 1 | 1 | |||
0 | 1 | i.e. 0&A |
Jump-condition
Opcode | lt | eq | gt |
---|---|---|---|
(blank) | |||
; JLT | 1 | ||
; JEQ | 1 | ||
; JGT | 1 | ||
; JLE | 1 | 1 | |
; JGE | 1 | 1 | |
; JMP | 1 | 1 | 1 |
; JNE | 1 | 1 |
Assembler program
Write a program in assembler which causes the lamp to blink at least three times.
The lamp is memory-mapped to the address 7FFF, bits 1 and 0.
Bit | Set to 1 to: |
---|---|
0 | Turn lamp on |
1 | Turn lamp off |
The external device is only affected when a bit is changing from 0 to 1.
给内存地址 0x7FFF 写入 0x1,则灯亮;直接对此数值取反,则灯灭。
1 | # 0x7FFF |
👆 其中 ci3
表示最高的 3 位(ci
及其右边的两位)。
Keyboard input
Write a program in assembler which write keyboard input into memory.
The keyboard input is memory-mapped to address 6000
.
Write the first character typed at memory address 1000
(hex), the second typed at 1001
(hex) and so on.
Note: A key will usually be held down for much longer than a clock cycle, but should only be registered as a single input until the key is released.
主要的麻烦是寄存器太少……得用内存来保存变量。
需要记录下一次要写入的内存地址(起始值为 0x1000),用内存地址 0xFFF 来记录此值。
1 | DEFINE KBD 0x6000 |
Escape Labyrinth
The computer is stuck in a labyrinth on Mars. Write a program that will make it escape the labyrinth.
The computer has connected wheels and a forward obstacle detector. Input/output to wheels and detector is memory-mapped on address 7FFF:
Output signals to peripherals:
Bit | Bin | Set to 1 to: |
---|---|---|
2 | 0b0100 | Move forward (1 step) |
3 | 0b1000 | Turn left (90 degrees) |
4 | 0b1_0000 | Turn right (90 degrees) |
The movement/turn is started when a bit is changing from 0 to 1, but will take a moment to complete.
Input from peripherals:
Bit | Bin | When 1 |
---|---|---|
8 | 0b1_00000000 | Obstacle detected in front |
9 | 0b10_00000000 | Device is turning |
10 | 0b100_00000000 | Device is moving forward |
读写都在内存地址 7FFF。
注意移动和旋转都会持续一段时间,需要等待动作完成,如:
1 | DEFINE IO 0x7FFF |
走迷宫用「沿墙走法」(使用于没有环形路径的迷宫)。比如选定右手 🫱,单手模住一面墙出发,手始终不离开墙面。
但这里麻烦的点是它不会判定左边或右边是否有墙壁,就得反反复复第左右转动……判断右手是否是墙的动作就变成「向右转 - 判断前方是否有墙 - 向左转」。
- ① 如果右手是墙,前方没有墙,则前进;
- ② 如果右手是墙,前方也是墙,则左转;
- ➂ 如果右手没有墙,则右转。
按照这个 computer 的行为模式则为:
- 判断前方是否有墙并记住
- 右转
- 判断前方有墙
- 若没有,则前进(对应上边情况 ③)
- 若有,则检查刚才记录的开始时前方是否有墙
- 若有,则左转两次(对应上边情况 ②)
- 若没有,则左转,然后前进(对应上边情况 ①)
1 | # Assembler code |
Display
Display a logo (at least 16 pixels in both width and height) of your own choice on the screen.
The screen is 512 x 256 monochrome pixels, memory-mapped from address 0x4000 to 0x6000. Each address correspond to 16 pixel on the screen. The lines are contiguously in memory, so first line start at 0x4000, second line starts at 0x4020 and so forth.
做一个 32 x 32 的 logo 放在屏幕中央。Logo 第一行内存地址为 0x4E0F、0x4E10,下一行是 0x4E2F 和 0x4E30,然后是 0x4E4F 和 0x4E50,……
另外需要注意,以 data instruction 模式往寄存器 A 写数据的时候,最高位被 ci flag 占用,无法写入大于等于 0x8000 的数字。所以对于大于等于 0x8000 的数值,需要先将其反码写入寄存器 A,然后在执行 bitwise inversion 操作得到想要的数字。
以下代码完全由 AI 生成。
把 PNG 图片给 AI,让它缩放后生成 32 行、每行两个 16-bit hex 的数据。再把第一行数据的处理代码示例以及每一行的地址变化规律告诉它,把大数字的取反规则告诉它,让它生成完整的代码。
1 |
|
Network
Receive data from another computer over the network and display it on the screen.
The payload will be an image 16 pixels in width.
The network wires are memory mapped to the address 6001
(hex), with two significant bits: data (bit 0) which is the current bit of data sent over the wire and sync (bit 1) which change to indicate that a new bit has arrived.
Each time the sync signal changes (from 0 to 1 or from 1 to 0), a new bit can be read from the data wire.
The protocol (in this mission) is that a transmission always starts with a 1 bit, followed by 16 bits of data, then followed by a control bit. If the control bit is 0, it means the transmission has ended. If the control bit is 1, it means another 16 bits data will follow, again followed by a control bit. And so on.
1 | DEFINE C_NET_ADDR 0x6001 |
Software Level: Stack machine
Init stack
The stack is an area of memory where we can store and retrieve intermediate values in a last-in-first-out manner.
We use the first available memory address, address 0, to store the stack pointer (or SP).
We write this in the form of a macro called init.stack
. A macro is a snippet of code which can be easily reused. If the keyword init.stack
is used in assembler, it will be replaced with this code.
Set the Stack Pointer (RAM address 0) to 256 (Hex value 0100).
It may be helpful to define a constant named SP with the value 0.
- SP (Stack Pointer): 值为 0 的常量,即
mem[0]
记录当前的栈顶。 - 初始栈顶地址为 0x0100(这里栈顶是可以写入的位置)。
Constants:
- SP: 0
1 | # init.stack |
Push D
Storing a new value on the stack is called pushing.
Write code which pushes the current value of the D-register on the top of the stack.
The SP should be increased by one.
SP points to the address after the top of the stack.
1 | # push.D |
Pop D
Retrieving the value at the top of the stack is called popping a value.
Write code which pops the value at top of the stack and writes it to the D-register.
The stack pointer (SP) should be decreased by 1 when a value is popped.
SP points to the address after the top of the stack, so the value to retrieve is at SP - 1.
1 | # pop.D |
Pop A
Write code which pops the value at top of the stack and writes it to the A-register.
Important criteria: The D-register must not be affected by this operation.
1 | # pop.A |
Push Value
Now we introduce a macro which use a placeholder.
The macro keyword push.value
must followed by a number, e.g. push.value 42
.
When the macro is used, the placeholder keyword value
in the macro code will be replaced with the specified number, i.e. 42
.
1 | # push.value <value> |
Add
Pop two values from the stack, add them, and push the sum on the stack.
1 | # add |
Sub
Pop two values from stack, subtract the first from the second, and then push the result back on the stack.
“subtract the first from the second” = 「从第二个数中减去第一个数」,即 后出栈的数 - 先出栈的数。
因为出栈是 macro 内部逻辑,从使用者的角度,就是 先入栈的数 - 后入栈的数。
1 | # sub |
Neg
Negate the value on top of the stack.
1 | # neg |
And
Pop two values from stack, perform a bitwise AND and push the result back on the stack.
1 | # and |
Or
Pop two values from stack, perform a bitwise OR and push the result back on the stack.
1 | # or |
Software Level: High-level language
A high level language have a more human-friendly and flexible syntax which is then compiled into machine code instructions. For example the high-level code 2 + 2
could be compiled into the low-level code:
1 | push.value 2 |
Compilation has three stages:
- (1) Tokenization
- (2) Parsing
- (3) Code generation
Tokenize
The tokenizer is preconfigured to recognize numbers and the symbol ‘+’.
Configure the tokenizer to additionally recognize the symbols minus ‘-’ and parentheses ‘(’ and ‘)’.
Token type:
- Exact matches match the exact text specified under
Match
. Multiple exact matches can be specified in the same box, separated by whitespace. - Pattern can use character groups in brackets and quantifiers
*
and+
.
- Example:[0-9]
matches a decimal digit,[0-9]+
matches one or more digits.
Gramma property:
- Ignore patterns are skipped by the tokenizer. Use e.g. for whitespace and comments.
- Name patterns are represented with the specified name in the grammar.
- Literal matches are represented with the literal text in the grammar.
Token definitions:
Type | Match | Grammar | Token name |
---|---|---|---|
Pattern | [ ]+ | Ignore | |
Pattern | [0-9]+ | Name | Number |
Exact | + - ( ) | Literal |
Grammar
Parse the sequence of tokens into a syntax-tree.
The syntax of a high-level language is described through a grammar.
A grammar is a set of rules where each rule names a part of the syntax and defines how it composed.
The terms used in the grammar are called symbols. The rules define how a symbol (left of the arrow) is composed of one or more other symbols (right of the arrow). The symbols the right of the arrow are either tokens which is defined by the token specification (in the previous step), or they are symbols themselves defined by rules in the same grammar.
The symbols representing tokens (like Number
and +
) are called terminals, the symbols like Program and Expression which are defined by other rules in the grammar are called non-terminals.
The names used as non-terminal symbols are arbitrary – you can use names which makes sense for you. Only condition is there must be a “starting symbol” called Program
, which represent the whole program.
This game uses an Earley-parser, which is not the fastest but which is flexible and easy to write a grammar for.
Define a Grammar for expressions involving numbers, parentheses and the operators +
and -
.
The start symbol is Expression
.
An expression should correspond to one of:
- A
Number
token - Expression
+
Expression - Expression
-
Expression -
Expression(
Expression)
Grammar:
本关中的错误,在 check solution 时可能无法指出,到下一关 code generation 时可能会遇到问题,需要再回来修改。
- Expression →
Number
- Expression →
Number
- Expression →
( Expression )
- Expression →
Expression + Expression
- Expression →
Expression - Expression
Code generation
The third step in the compilation is to generate machine code from the syntax tree.
This is done by associating each syntax rule with a block of assembler code.
The compiler then generate the resulting code by traversing the syntax tree and for each node in the tree generate the code associated with the rule.
Define code-generation for the syntax rules of the language, to support addition and subtraction.
Syntax rules:
1 | # Expression → `Number` |
有个问题是它不会遵循从左向右的计算顺序,比如100 - 2 + 2 - (7 + 10)
会得到如下的表达式,结果是 113 而不是期望的 83。
Software Level: Conditonals
Eq
Pop the two top values from the stack and compare them. If they are equal, push the value -1 (FFFF
in hex). Otherwise push 0
.
In conditionals, FFFF
represents true and 0
represents false.
1 | # eq |
Gt
Pop the two top values from the stack and compare them. If the first is greater than the second, push the value -1 (FFFF
in hex). Otherwise push 0.
这里文字描述跟图片有出入。文字说的是如果 先出栈的数 > 后出栈的数,结果为 -1。但图中先出栈 5,大于后出栈的 3,结果却为 0。
“Test code” 的提示跟文字描述的逻辑一致。
但 “Check solution” 的检查逻辑似乎跟图一致,即当 先入栈的数 > 后入栈的数,结果为 -1,否则为 0。这个其实跟 sub 的行为(先入栈的数 - 后入栈的数)也是类似的。
1 | # gt |
Lt
Pop the two top values from the stack and compare them. If the first is less than the second, push the value -1 (FFFF
in hex). Otherwise push 0.
同样,这里应该是指当 先入栈的数 < 后入栈的数,结果为 -1,否则为 0。
1 | # lt |
Not
Invert the value on top of the stack, using bitwise inversion.
1 | # not |
Goto
Jump to the label given in the placeholder.
1 | goto <label> |
If-goto
Pop the value on top of the stack.
Jump to the label if it is non-zero.
1 | # if.goto <label> |
Software Level: Memory
Push Memory
The value on top of the stack is a memory address.
Pop the address from the stack. Fetch the current contents of the memory address, and push this on the stack.
1 | # push.memory |
Pop Memory
Pop two values value from the stack. The second value is a memory address.
Write the first value to memory at the given address.
从使用者的角度,先入栈地址,后入栈数据。
1 | # pop.memory |
Push Static
Take the current contents of the memory address given by the address
placeholder and push it on the stack.
1 | # push.static <address> |
Pop Static
Take the value on top of the stack and store it at the memory address given by the address
placeholder.
1 | # pop.static <address> |
Software Level: Functions
The function is perhaps the single most important abstraction in software.
A function is a unit of code which take some input (called arguments), has some local storage to use for processing, and return a value.
A function can be executed (called) from anywhere in the program. When a function is called, the address of the call is stored on the stack, and when the function is complete, it returns to the address from where it was called
Arguments and local storage is also stored on the stack.
Functions require three segments to work together:
- Call where a function is called from somewhere in the program.
- Function which is the start of the function
- Return which is the end of the function.
These three units need to work together according to a shared convention for how data is passed in and out of the function.
Call
The call macro invokes a function. It should prepare the stack for the call, jump to the given function label, and afterwards restore the state.
Before the call, zero or more values are placed on the stack. The placeholder argumentCount is the number of arguments
The calling convention requires three shared memory slots:
ARGS = 1
- Address of the arguments for the current function
LOCALS = 2
- Address of the local storage for the function
RETVAL = 6
- Temporary slot for the value returned from a function.
(These slot addresses can be defined as shared constants for convenience.)
Steps:
- Push the current value of ARGS on the stack.
- Push the current value of LOCALS.
- Push the address immediately after the jump (the return address).
- Calculate a new ARGS address which is the current SP minus argumentCount minus 3 (because we just pushed three values on the stack).
- Jump to the address given by the functionName placeholder.
After the function call is executed, control will return to the label following the jump.
- Store the current ARGS value in a temporary slot.
- Restore the LOCALS value from the stack
- Restore the ARGS value from the stack
- Set SP to the previous ARGS value ❓ Why not use
SP - argumentCount
- Push RETVAL on the stack
1 | # call <functionName> <argumentCount> |
Function
The Function macro defines the top of the function block. It should adjust the stack to make space for local storage. The size of the local storage is given in the placeholder localsCount
- A label with the name given in the placeholder functionName should start the block.
- Set LOCALS to the current SP.
- Make space for local data on the stack by adding localsCount to the current SP value.
1 | # function <functionName> <localsCount> |
Return
The Return macro defines the end of the function block. It should store the return value in the memory address RETVAL and restore the stack.
- Pop the top value from the stack into the RETVAL memory slot.
- Set SP to the value of LOCALS
- Pop the return address from the stack and jump to it.
1 | # return |
Push argument
Take the current contents of the memory address given by ARGS
+ the index placeholder and push it on the stack.
1 | # push.argument <index> |
Pop argument
Take the value on top of the stack and store it at the memory address given by the value of ARGS
+ the index placeholder.
1 | # pop.argument <index> |
Push local
Take the current contents of the memory address given by LOCALS
+ the index placeholder and push it on the stack.
1 | # push.local <index> |
Pop local
Take the value on top of the stack and store it at the memory address given by the value of LOCALS
+ the index placeholder.
1 | # pop.local <index> |