Editing Spec.isa

{{Breadcrumbs|Alpha Branch/{{SUBPAGENAME}}}}
The ISA ("Instruction Set Architecture") specification is the language used to define assembly instructions in Turing Complete.  ISA specifications are stored alongside schematics in the player's save folder, with the name <code>spec.isa</code>.
{{note|type=info|Stuffe has released a utility to parse, compile, decompile and perform other operations on ISA specifications here: https://github.com/Stuffe/isa_spec.}}

An ISA specification file consists of several sections, delimited by section headers enclosed in square brackets (<code>[]</code>).

== Settings ==
The <code>[settings]</code> section defines several configuration properties for your ISA:

=== name ===
<pre>
name = "Architecture name"
name = architecture_name
</pre>
Names the architecture this ISA specifies.  It can be either a string or an identifier.

=== variant ===
<pre>
variant = "Architecture variant"
</pre>
Additional text that can be used to describe the ISA.

=== endianness ===
Defines the byte order of the instructions.
<pre>
endianness = big
endianness = little
</pre>
The default is big endian.

For example, if we have an instruction defined as <code>11110000 00001111</code> then:
* <code>big</code> endian will encode the instruction as <code>240 15</code>.
* <code>little</code> endian will encode the same instruction as <code>15 240</code>.
(as shown in, for example, the {{Component|ROM||alpha}} component)

{{note|type=info|The memory components (ROM, RAM, SSD, etc.) have their own endianness flag, which can reverse the byte order again.  This is straightforward if your instruction width happens to match the memory component' s bit width, but can be confusing if the widths differ.}}

=== line_comments ===
Specifies the tokens used to mark the start of a single-line comment.  Single-line comments start at the given token and consume the remainder of the line.
<pre>
line_comments = "#"
line_comments = [";", "//"]
</pre>
The default line_comments value is <code>[";", "//"]</code>.

{{note|type=info|Any parenthesis style can be used for the list syntax: <code>[]</code>, <code>()</code> or <code>{}</code>.  The closing parenthesis style must match the opening parenthesis (so <code>(]</code> would fail to parse).}}
{{note|type=warn|While the compiler respects the <code>line_comments</code> setting, the syntax highlighter in the assembler window currently only recognizes the default tokens.}}

=== block_comments ===
Specifies the tokens used to mark the start and end of a block comment.  Block comments start at the first token and consume everything up to the final token.  This includes line breaks as well as additional block comment start tokens (preventing block comments from being nested).
<pre>
block_comments = {"/*":"*/"}
</pre>
The default block_comments value is <code>{"/*":"*/}</code>.

{{note|type=info|Any parenthesis style can be used for the list syntax: <code>[]</code>, <code>()</code> or <code>{}</code>.  The closing parenthesis style must match the opening parenthesis (so <code>(]</code> would fail to parse).}}
{{note|type=warn|While the compiler respects the <code>line_comments</code> setting, the syntax highlighter in the assembler window currently does not - not even the default tokens.}}

== Fields ==
The <code>[fields]</code> section defines different types of fields that can be included in the instruction definitions, as well as what values they can have.  Fields are separated by blank lines, and each field starts with a field name on the first line followed by one or more allowable field values.  Values are bit patterns and all values within a field must have the same bit length.

=== Example 1 - Overture ===
The Overture specification provides a good starting example:
<pre>
register
r0 000
r1 001
r2 010
r3 011
r4 100
r5 101
in 110
out 110
</pre>
Here we can see a couple of features:
* Names are arbitrary.  <code>r0</code> through <code>r5</code> are considered registers only by convention, while <code>in</code> and <code>out</code> follow an entirely different naming scheme within the same <code>register</code> field.
* Value repetition is allowed.  <code>in</code> and <code>out</code> both have the same value of <code>110</code>.
* Completion is not required.  <code>111</code> is not assigned to any label.

=== Example 2 - aarch64 ===
Also note that the field name itself is arbitrary.  We can take an excerpt from https://github.com/Stuffe/isa_spec/blob/main/spec_lib/aarch64/aarch64.isa as an example:
<pre>
condition_code
eq 0000
ne 0001
cs 0010
hs 0010
</pre>
(In this case, all 16 values are included in the ISA itself.  The list has been truncated here for the sake of brevity.)

We can see a couple of other features in this example:
* The field name is not <code>register</code>, as noted.
* The field is four bits wide rather than three.  In principle, the fields can be any width desired (greater than zero).

=== Example 3 - aarch64, again ===
A final example from the same <code>aarch64.isa</code>:
<pre>
pound
"#" 0
"" 0
</pre>
The features this time are:
* <code>"literal"</code> syntax to require a literal string be included in your instructions.
* <code>""</code> to allow an empty string, effectively making this field optional.
In this instance, the <code>pound</code> field is used for immediate values, allowing the player to write either <code>#37</code> or just <code>37</code> to describe the immediate value 37.

=== Reserved field names ===
There are two reserved names:
* <code>label</code> references labels within the assembler (eg: <code>mylabel:</code>).
* <code>immediate</code> references immediate values (eg: <code>37</code>).

== Instructions ==

The <code>[instructions]</code> section is where we finally define the instructions themselves.  Instructions are separated by a blank line and each consists of several lines:
* [[#Assembly_format|The assembly format]]
* [[#Virtual_operands|Virtual operands]]
* [[#Assertions|Assertions]]
* [[#Output_bit_pattern|The output bit pattern]]
* [[#Description|An optional description line]]
* [[#Addendum:_Masking|Addendum: Masking]]

To motivate with an example, let's consider the following hypothetical DIV instruction that could be added to the Symphony architecture:
<pre>
div %a(register), %b(register), %c:S16(immediate)
%bits = (0 - popcount(%c))
!assert %bits >> 63 == 1
01011100 0aaa0bbb cccccccc cccccccc
Signed DIV %b by %c and store the result in %a.
</pre>

=== Assembly format ===
<pre>
div %a(register), %b(register), %c:S16(immediate)
</pre>
The assembly format is a string of whitespace-delimited tokens, which come in two flavors:  literals and operands.

==== Whitespace ====
Whitespace can be either tabs or spaces.  If a single whitespace character is used, the space will be optional in the resulting assembly language.  If the whitespace character is instead followed by another space ({{note}} Not a tab), the assembler will require whitespace between the tokens.  For example, an instruction like <code>function&nbsp;()</code> would match <code>function()</code> or <code>function&nbsp;()</code>, while the instruction <code>function&nbsp;&nbsp;()</code> (with two spaces) will only match the latter, enforcing the space between the word "function" and the following parenthesis.

==== Literals ====
<code>div</code> in the motivating example is a literal.  The user must type this exactly in order for the instruction to be matched.  Of particular note however is that the commas (<code>,</code>) are also literals.
{{note|type=info|The digraph <code>%%</code> must be used if you wish to use a literal <code>%</code> symbol in your assembly syntax, as <code>%</code> is a special character in the ISA language itself.}}
{{note|type=warn|The game does not currently prevent you from using your line or block comment tokens as literals.  However, they will be treated as comment tokens by the assembler and will prevent the instruction from functioning.}}

==== Instruction Operands ====
Instruction Operands start with the <code>%</code> prefix and are written in the form <code>%name:size(fields)</code>.
* <code>name</code> is any identifier.  These are typically kept short for convenience such as <code>%a</code> or <code>%imm</code>, but in principle can be any length.  The name is mandatory.
* <code>size</code> is either <code>S</code> or <code>U</code> for signed and unsigned values, respectively, followed by a size in bits.  For example <code>S32</code> or <code>U3</code>.  The size (and the preceding colon) are optional.  The default is <code>U64</code>, and the size cannot currently exceed 64 bits.
* <code>fields</code> is a list of one or more fields created in the [[#Fields|prior section]] (or the reserved fields).  The list is delimited by the pipe (<code><nowiki>|</nowiki></code>) character.  The list of fields is mandatory.

=== Virtual operands ===
<pre>
%bits = 0 - popcount(%c)
</pre>
In addition to the operands included in the instruction syntax, additional "virtual" operands can be created to simplify instruction creation or, as in this example, to provide some minimal compile-time correctness guarantees.

The general syntax is:
<pre>
%name:size = expression
</pre>
<code>name</code> and <code>size</code> follow the same definition rules as [[#Instruction_Operands|instruction operands]].
<code>expression</code> is a relatively typical mathematical expression using C-like operators and a handful of built-in functions.

==== Expression Operators ====
The game provides a limited set of operators for the construction of virtual operands:
{| class="wikitable"
! Operator !! Description !! Example !! Result if %a = -30000, 16-bit
|-
| <code>+</code> || addition || <code>%a + 7</code> || -29993
|-
| <code>-</code> || subtraction || <code>%a - 7</code> || -20007
|-
| <code>*</code> || multiplication || <code>%a * 7</code> || -13392
|-
| <code>/</code> || division || <code>%a / 7</code> || -4285
|-
| <code>%</code> || modulo (remainder after division) || <code>%a % 7</code> || -5
|-
| <code>&</code> || bitwise AND || <code>%a & 7</code> || 0
|-
| <code><nowiki>|</nowiki></code> || bitwise OR || <code>%a <nowiki>|</nowiki> 7</code> || -29993
|-
| <code>^</code> || bitwise XOR || <code>%a ^ 7</code> || -29993
|-
| <code>&lt;&lt;</code> || logical shift left (LSL) || <code>%a &lt;&lt; 7</code> || 26624
|-
| <code>&gt;&gt;</code> || logical shift right (LSR) || <code>%a &gt;&gt; 7</code> || 277
|}

A notable omission is the lack of unary operators.  However, the two most common unary operations can be written using binary operators as follows:
{| class="wikitable"
! Operator !! Description !! Alternative
|-
| <code>~</code> || bitwise NOT || <code>(-1 ^ %a)</code>
|-
| <code>-</code> || negation || <code>(0 - %a)</code>
|}

{{note|type=warn|Be cautious of unexpected bit width extension when using signed values.  The parser is fairly good at doing the right thing, but it can occasionally get confused and sign extend to a full 64 bits.  When in doubt, [[#Addendum:_Masking|mask]] it out!}}

==== Operator precedence ====
The game only defines three precedence levels:
{| class="wikitable"
! Precedence !! Operators
|-
| Parenthesis || <code>()</code>
|-
| Multiplicative || <code>*</code>, <code>/</code>, <code>%</code>
|-
| Everything else || <code>+</code>, <code>-</code>, <code>&</code>, <code><nowiki>|</nowiki></code>, <code>^</code>, <code>&lt;&lt;</code>, <code>&gt;&gt;</code>
|}

==== Functions ====
The game also provides a handful of built-in functions for the construction of virtual operands:
{| class="wikitable"
! Function !! Description !! Example !! Result if %a = -30000 and %b = 27, 16-bit 
|-
| <code>asr</code> || arithmetic shift right (ASR) || <code>asr(%a, 7)</code> || -235
|-
| <code>log2</code> || floor of the base-2 logarithm if >0, -1 otherwise || <code>log2(%a)</code> <br> <code>log2(%b)</code> || -1 <br> 4
|-
| <code>popcount</code> || number of <code>1</code>s in the base-2 representation || <code>popcount(%a)</code> || 6 ({{note}} Currently reports 54 as of 0.1354.  [[#Addendum:_Masking|mask]] out high bits as needed for <64bit values.)
|-
| <code>trailing_zeros</code> || number of <code>0</code>s after the rightmost <code>1</code> in the base-2 representation || <code>trailing_zeros(%a)</code> || 4
|}

==== Instruction Address ====
The special character <code>$</code> returns the memory address of the start of the current instruction.  This is useful for instructions such as relative jumps.  If we assume <code>0b01000000</code> is the opcode for an absolute unconditional jump, we can use the following example to create a relative jump instruction without any additional hardware:
<pre>
jmp_rel %offset:S16(immediate)
%target = $ + %offset
01000000 tttttttt tttttttt
</pre>
{{note|type=info|This instruction only will only work with immediate values.  If you try to use a register, you would get the register index - not its contents - added to the instruction address.}}

==== Bit slicing ====
The expression evaluator also provides syntax for picking specific bits from a value, known as "slicing". This is written as <code>value[start:end]</code> where all three parts (<code>value</code>, <code>start</code>, and <code>end</code>) can be arbitrary expressions. <code>start</code> and <code>end</code> are inclusive indices into the bits of value, with the LSB being 0 and numbering being right-to-left. To make this a bit simpler to visualize, start has to be larger than end.  For example, the slice <code>%imm[5:2]</code> applied to the value 90 (<code>0b01011010</code>) would result in 6 (<code>0b0110</code>), as shown:
<pre>
  76543210
    ||||
    vvvv
0b01011010
</pre>

==== Expression Operands ====
Operations in the expression can refer to any operands from the instruction definition, as well as any virtual operands created on the preceding lines.
{{note|type=warn|As alluded to in the [[#Instruction_Address|instruction address]] <code>rel_jmp</code> example, expressions are computed at assembly time.  You cannot utilize runtime values such as register or memory contents - you'll get the index or address instead of the value.  Runtime calculations can only be done by your hardware components.}}

==== Bit widths ====
Some operations (such as multiplication) will easily allow you to exceed the bit width of the virtual operand you're creating, or of the final output bytes.  The game will cause an error when that occurs.  Additionally, negative values are prone to being interpreted as 64-bit unsigned values (as of 0.1354 Beta) which can cause unexpected errors and odd-looking error messages.
When in doubt, [[#Addendum:_Masking|mask]] out the result of your expressions to ensure they fit within the intended bit width, especially when working with signed values.

=== Assertions ===
<pre>
!assert %bits >> 63 == 1
</pre>

Assertions can be intermixed with [#Virtual_Operands|virtual operands] and trigger an error if they are not true. Currently the only supported comparison operator is equals (<code>==</code>), although more comparisons are planned in future. After the comparison a string can be added to produce a more context-specific error message.

The full syntax is:
<pre>
!assert <lhs> == <rhs>
</pre>
where <code>&lt;lhs&gt;</code> and <code>&lt;rhs&gt;</code> are both arbitrary expressions, using the same syntax as [[#Virtual_operands|virtual operand]] assignment expressions.

=== Output bit pattern ===
<pre>
11 000 ccc
</pre>

The bit pattern comes after any virtual operands or assertions.  It defines a sequence of bits that get interpreted as a single big-endian number (which then gets rearranged according to the [[#endianness|endianness] setting).  In this line, space characters (and only space characters) are completely ignored.

Bits can be defined by either individual letters/numbers or explicit operand references.

==== Fixed patterns ====

<code>0</code> and <code>1</code> produce those bits in the output and require those exact bits to be there while disassembling. <code>?</code> also produces <code>0</code> while assembling, but this bit will be ignored when disassembling (once that gets reimplemented).

==== Individual letters ====

All lowercase letters can be used as shortcuts to refer to operands. A letter to refers to the first operand that starts with that letter, with the order being left-to-right in the syntax line and then top-to-bottom for virtual operands.

If a letter appears multiple times, all of those appearances are assigned bits as if they were a single larger field. The bits are assigned LSB-to-MSB to the repeated letters in a right-to-left ordering. For example,

<pre>
test_split_operand
%c = 0b110001
c 0 ccccc 0
</pre>

would result in <code>10100010</code>.

If you use an operand that doesn't support the full length your bit pattern defines, the value will be sign-extended for signed operands and zero-extended for unsigned operands:
<pre>
test_extended_operands
%s:S3 = 0b100
%u:U3 = 0b100
ssssssss uuuuuuuu
</pre>
will result in <code>252 4</code>.

However, the opposite is not true - fields longer than their output format will not be truncated, but will result in an error.  For example:
<pre>
test_overrun
%u:U8 = 32
uuuu0000
</pre>
will result in the error <code>Value 32 outside of range for this 4-bit zero-extended field</code>.

==== Explicit Operand References ====

<pre>
00 %imm[5:0]
</pre>

In addition to individual letters, operands can also be referenced by their full name using the <code>%</code> prefix.  In this case you also need to explicitly tell the assembler which bits to use, using a syntax similar to [[#Bit_slicing|bit slicing]]:
<pre>
%operand[start:end]
</pre>
Unlike the arbitrary bit slicing available when creating virtual operands, the syntax here is much more restrictive:  <code>operand</code> must be a single operand name (instruction operand or virtual), and <code>start</code> and <code>end</code> must be unsigned integer literals.

As with the individual letter syntax, values shorter than the requested bits will be sign- or zero-extended appropriately:
<pre>
test_extended_slicing
%s:S3 = 0b100
%u:U3 = 0b100
%s[7:0] %u[7:0]
</pre>

will result in <code>252 4</code>.

Unlike the single letter syntax, we do not have to worry about overruns with slicing syntax as we're specifying exactly which bits we want, regardless of whether there are more bits available or not.

=== Description ===
The instruction description is a single line containing any text.  There are no syntax restrictions.  Multi-line descriptions are not possible at this time.

=== Addendum: Masking ===
On several occasions through this document, "mask it out" has been noted as a solution to various bit width problems.  While masking is a fairly common and well-known operation in computer programming, this section will describe it for the sake of completeness.

There are several kinds of masking you hear of on occasion, with the most prominent by far being the ability to extract one or more individual bits from a larger number.  This is performed by ANDing the value you want to extract the bits from with a ''mask'', which is a second number containing <code>1</code> in bit positions you wish to extract, and <code>0</code> elsewhere.  For example, if we have the number <code>90</code> and wish to extract the bottom four bits, we can use the following code (in [[#Virtual_operands|virtual operand]] syntax):
<pre>
%value = 90
%masked = %value & 0b1111
</pre>

This will return the value 10 (<code>1010</code>).

One thing to be careful of is that the bits you select with your mask will remain in the position they were originally.  If we instead want the top four bits and try to use the code:
<pre>
%value = 90
%masked = %value & 0b11110000
</pre>
we will end up with the value 80 (<code>01010000</code>).  We would need to perform an additional shift operation if we wanted to extract those four bits into an actual four-bit number:
<pre>
%value = 90
%masked = (%value & 0b11110000) >> 4
</pre>
will return the value 5 (<code>0101</code>) that we were looking for.

==== An alternative ====
The game's bit slicing syntax provides an alternative method for extracting higher bits:
<pre>
%value = 90
%masked = %value[7:4]
</pre>
will also return the value 5.  There is no need to shift in this case, as the bit slicing operator treats the lower bound as the LSB of the extracted value.

==== Constructing a mask ====
Using hexits or even bits to describe a mask is easy enough if you're working with fairly small constant values (a full 64-bit mask is 16 hexits!  That's a lot to write out already, never mind trying to write it in binary!)  If you're working with non-constant values however, or if you just don't want to manually work out such large numbers, you can create your own mask with shifts and ORs.  For example, if you want the 38th bit, you can simply use:
<pre>
%mask = 1 << 38
</pre>
{{note|type=info|This shift operation counts bits starting at 0, so your LSB will be <code>1&nbsp;&lt;&lt;&nbsp;0</code>, not <code>1&nbsp;&lt;&lt;&nbsp;1</code>, despite it often being referred to as the "first" bit.}}

If you want the three bits from the 36th through the 38, you can OR them together like so:
<pre>
%bit36 = 1 << 36
%bit37 = 1 << 37
%bit38 = 1 << 38
%mask = %bit36 | %bit37 | %bit38
</pre>

That is obviously going to get very cumbersome if you want to extract 8 or 16 or more bits.  We can use a nice property of binary numbers to simplify this however:  <code>1&nbsp;&lt;&lt;&nbsp;n</code> has a single bit set at position <code>n</code>, so <code>(1&nbsp;&lt;&lt;&nbsp;n)-1</code> has that single bit cleared and all lowest <code>n</code> bits set:
<pre>
%mask = (1<<39)-1
</pre>
will return a mask with the lowest 39 bits enabled.
{{note|type=info|We want to include %bit38 from the prior example, so we need to go one higher before subtracting!}}

But we can do better!  If we want to recreate the 3-bit mask from above, we can construct it by using a second mask to "turn off" some of the lower bits from the first mask:
<pre>
%hi_mask = (1<<39)-1
%lo_mask = (1<<36)-1
%mask = %hi_mask ^ %lo_mask
</pre>
This initially creates <code>%hi_mask</code> with the lowest 39 bits enabled, as before.  It then creates <code>%lo_mask</code> with only the lowest 36 bits enabled.  The XOR operation performs the task of "turning off" the lower 36 bits:
* bits 0..35 are turned off because <code>1^1&nbsp;=&nbsp;0</code>
* bits 36..38 remain on because <code>1^0&nbsp;=&nbsp;1</code>
* bits 39..63 remain off because <code>0^0&nbsp;=&nbsp;0</code>

Finally, we can perform a shift to fully encapsulate the bit slicing behavior.  In fact, we don't even need to "turn off" the lower bits this time, as we'll be shifting them out:
<pre>
%mask = (1<<39)-1
%value = 0x123456789ABCDEF
%extracted = (%value & %mask) >> 36
</pre>

==== Putting it all together ====

While not terribly useful as anything more than a teaching example, we can combine several of the techniques from above create to generic mask that functions the same as the bit slicing operation <code>%value[%top:%bottom]</code>:
<pre>
%zerofix = 0 - ((0 - popcount(%top)) >> 63)
%mask = (1 << %top) | (((1 << ((%top - 1) & %zerofix)) - 1) << 1) | 1
%value = 0x123456789ABCDEF
%extracted = (%value & %mask) >> %bottom
</pre>
{{note|type=error|1=The highest bit (63) currently causes a lot of problems, many of which crash the game.  A bit of a description of what's happening there:
* <code>0 - popcount(%top)</code> will return zero if <code>%top</code> is zero, or a negative value if <code>%top</code> is greater than zero (it's an unsigned operand so we don't have to worry about it being less than zero!).
* <code>(0 - popcount(%top)) >> 63</code> extracts just the sign bit: <code>1</code> if the subtraction is negative (and therefore <code>%top&gt;0</code>) or zero if <code>%top=0</code>.
* <code>0 - ((0 - popcount(%top)) >> 63)</code> will again return <code>0</code> if the previous step was zero, but it will return <code>-1</code> if the previous step is <code>1</code>, giving us a mask with either all 64 bits clear or all 64 bits set.
* <code>(%top - 1) & %zerofix</code> will give us zero if <code>%top</code> is zero, by using the all-zeroes mask from the prior step to erase the <code>-1</code> we'd otherwise expect from the subtraction.  If <code>%top</code> is not zero, we accept whatever <code>%top - 1</code> is, using the all-ones mask from the prior step.  This gives us a shift value between 0 and 62 (importantly, not 63.  The whole point of this mess is to remove 63 from our potential range of values).
* <code>((1 << ((%top - 1) & %zerofix)) - 1) &lt;&lt; 1</code> creates the (up to) 62-bit mask as described in the prior sections, and then shifts it over to compensate for the <code>-&nbsp;1</code> we took out initially.
* <code>{{!}} 1</code> fills in the gap created by the shift, and gives us the 63rd bit of our mask.  We can always assume this is valid as our smallest possible mask is <code>1&nbsp;&lt;&lt;&nbsp;0</code> (only the LSB), and every larger mask necessarily also includes the LSB.
* <code>(1 << %top)</code> is the 64th bit of our mask (maximally).  While bit 63 can have a lot of problems with arithmetic operations such as subtraction, it's generally safe with bitwise operations like <code>&lt;&lt;</code> and <code><nowiki>|</nowiki></code>.
These issues will presumably be fixed (or at least no longer crash) as the alpha branch matures, but in the meantime these types of workarounds are necessary (and even when it's fully mature you still won't be able to do <code>1&lt;&lt;64</code> (or whatever the maximum bit width is by then), so some smaller workarounds will still be needed even if they aren't crashing the game).}}

The various methods can also be combined (using AND, OR, XOR, etc.) to create all sorts of interesting masks.  For the most part though, mask-and-shift of contiguous bits is far more commonly used than "interesting" masks.