m (→Assertions) |
|||
Line 249: | Line 249: | ||
!assert <lhs> == <rhs> | !assert <lhs> == <rhs> | ||
</pre> | </pre> | ||
where <code><lhs></code> and <code><rhs></code> are both arbitrary expressions, using the same syntax as | where <code><lhs></code> and <code><rhs></code> are both arbitrary expressions, using the same syntax as [[#Virtual_operands|virtual operand]] assignment expressions. | ||
=== Output bit pattern === | === Output bit pattern === |
Revision as of 20:51, 6 February 2025
The ISA ("Instruction Set Architecture") specification is the language used to define assembly instructions in Turing Complete. ISA specifications are stored alongside schematics in the player's save folder, with the name spec.isa
.
An ISA specification file consists of several sections, delimited by section headers enclosed in square brackets ([]
).
Settings
The [settings]
section defines several configuration properties for your ISA:
name
name = "Architecture name" name = architecture_name
Names the architecture this ISA specifies. It can be either a string or an identifier.
variant
variant = "Architecture variant"
Additional text that can be used to describe the ISA.
endianness
Defines the byte order of the instructions.
endianness = big endianness = little
The default is big endian.
For example, if we have an instruction defined as 11110000 00001111
then:
big
endian will encode the instruction as240 15
.little
endian will encode the same instruction as15 240
.
(as shown in, for example, the ROM component)
line_comments
Specifies the tokens used to mark the start of a single-line comment. Single-line comments start at the given token and consume the remainder of the line.
line_comments = "#" line_comments = [";", "//"]
The default line_comments value is [";", "//"]
.
[]
, ()
or {}
. The closing parenthesis style must match the opening parenthesis (so (]
would fail to parse).line_comments
setting, the syntax highlighter in the assembler window currently only recognizes the default tokens.block_comments
Specifies the tokens used to mark the start and end of a block comment. Block comments start at the first token and consume everything up to the final token. This includes line breaks as well as additional block comment start tokens (preventing block comments from being nested).
block_comments = {"/*":"*/"}
The default block_comments value is {"/*":"*/}
.
[]
, ()
or {}
. The closing parenthesis style must match the opening parenthesis (so (]
would fail to parse).line_comments
setting, the syntax highlighter in the assembler window currently does not - not even the default tokens.Fields
The [fields]
section defines different types of fields that can be included in the instruction definitions, as well as what values they can have. Fields are separated by blank lines, and each field starts with a field name on the first line followed by one or more allowable field values. Values are bit patterns and all values within a field must have the same bit length.
Example 1 - Overture
The Overture specification provides a good starting example:
register r0 000 r1 001 r2 010 r3 011 r4 100 r5 101 in 110 out 110
Here we can see a couple of features:
- Names are arbitrary.
r0
throughr5
are considered registers only by convention, whilein
andout
follow an entirely different naming scheme within the sameregister
field. - Value repetition is allowed.
in
andout
both have the same value of110
. - Completion is not required.
111
is not assigned to any label.
Example 2 - aarch64
Also note that the field name itself is arbitrary. We can take an excerpt from https://github.com/Stuffe/isa_spec/blob/main/spec_lib/aarch64/aarch64.isa as an example:
condition_code eq 0000 ne 0001 cs 0010 hs 0010
(In this case, all 16 values are included in the ISA itself. The list has been truncated here for the sake of brevity.)
We can see a couple of other features in this example:
- The field name is not
register
, as noted. - The field is four bits wide rather than three. In principle, the fields can be any width desired (greater than zero).
Example 3 - aarch64, again
A final example from the same aarch64.isa
:
pound "#" 0 "" 0
The features this time are:
"literal"
syntax to require a literal string be included in your instructions.""
to allow an empty string, effectively making this field optional.
In this instance, the pound
field is used for immediate values, allowing the player to write either #37
or just 37
to describe the immediate value 37.
Reserved field names
There are two reserved names:
label
references labels within the assembler (eg:mylabel:
).immediate
references immediate values (eg:37
).
Instructions
The [instructions]
section is where we finally define the instructions themselves. Instructions are separated by a blank line and each consists of several lines:
To motivate with an example, let's consider the following hypothetical DIV instruction that could be added to the Symphony architecture:
div %a(register), %b(register), %c:S16(immediate) %bits = (0 - popcount(%c)) !assert %bits >> 63 == 1 01011100 0aaa0bbb cccccccc cccccccc Signed DIV %b by %c and store the result in %a.
Assembly format
div %a(register), %b(register), %c:S16(immediate)
The assembly format is a string of whitespace-delimited tokens, which come in two flavors: literals and operands.
Whitespace
Whitespace can be either tabs or spaces. If a single whitespace character is used, the space will be optional in the resulting assembly language. If the whitespace character is instead followed by another space ( Note: Not a tab), the assembler will require whitespace between the tokens. For example, an instruction like
function ()
would match function()
or function ()
, while the instruction function ()
(with two spaces) will only match the latter, enforcing the space between the word "function" and the following parenthesis.
Literals
div
in the motivating example is a literal. The user must type this exactly in order for the instruction to be matched. Of particular note however is that the commas (,
) are also literals.
%%
must be used if you wish to use a literal %
symbol in your assembly syntax, as %
is a special character in the ISA language itself.Instruction Operands
Instruction Operands start with the %
prefix and are written in the form %name:size(fields)
.
name
is any identifier. These are typically kept short for convenience such as%a
or%imm
, but in principle can be any length. The name is mandatory.size
is eitherS
orU
for signed and unsigned values, respectively, followed by a size in bits. For exampleS32
orU3
. The size (and the preceding colon) are optional. The default isU64
, and the size cannot currently exceed 64 bits.fields
is a list of one or more fields created in the prior section (or the reserved fields). The list is delimited by the pipe (|
) character. The list of fields is mandatory.
Virtual operands
%bits = 0 - popcount(%c)
In addition to the operands included in the instruction syntax, additional "virtual" operands can be created to simplify instruction creation or, as in this example, to provide some minimal compile-time correctness guarantees.
The general syntax is:
%name:size = expression
name
and size
follow the same definition rules as instruction operands.
expression
is a relatively typical mathematical expression using C-like operators and a handful of built-in functions.
Expression Operators
The game provides a limited set of operators for the construction of virtual operands:
Operator | Description | Example | Result if %a = -30000, 16-bit |
---|---|---|---|
+ |
addition | %a + 7 |
-29993 |
- |
subtraction | %a - 7 |
-20007 |
* |
multiplication | %a * 7 |
-13392 |
/ |
division | %a / 7 |
-4285 |
% |
modulo (remainder after division) | %a % 7 |
-5 |
& |
bitwise AND | %a & 7 |
0 |
| |
bitwise OR | %a | 7 |
-29993 |
^ |
bitwise XOR | %a ^ 7 |
-29993 |
<< |
logical shift left (LSL) | %a << 7 |
26624 |
>> |
logical shift right (LSR) | %a >> 7 |
277 |
A notable omission is the lack of unary operators. However, the two most common unary operations can be written using binary operators as follows:
Operation | Alternative |
---|---|
NOT %a | (-1 ^ %a)
|
-%a | (0 - %a)
|
Operator precedence
The game only defines three precedence levels:
Precedence | Operators |
---|---|
Parenthesis | ()
|
Multiplicative | * , / , %
|
Everything else | + , - , & , | , ^ , << , >>
|
Functions
The game also provides a handful of built-in functions for the construction of virtual operands:
Function | Description | Example | Result if %a = -30000 and %b = 27, 16-bit |
---|---|---|---|
asr |
arithmetic shift right (ASR) | asr(%a, 7) |
-235 |
log2 |
floor of the base-2 logarithm if >0, -1 otherwise | log2(%a) log2(%b) |
-1 4 |
popcount |
number of 1 s in the base-2 representation |
popcount(%a) |
6 (![]() |
trailing_zeros |
number of 0 s after the rightmost 1 in the base-2 representation |
trailing_zeros(%a) |
4 |
Bit slicing
The expression evaluator also provides syntax for picking specific bits from a value, known as "slicing". This is written as value[start:end]
where all three parts (value
, start
, and end
) can be arbitrary expressions. start
and end
are inclusive indices into the bits of value, with the LSB being 0 and numbering being right-to-left. To make this a bit simpler to visualize, start has to be larger than end. For example, the slice %imm[5:2]
applied to the value 50 (0b01011010
) would result in 6 (0b0110
), as shown:
76543210 |||| vvvv 0b01011010
Expression Operands
Operations in the expression can refer to any operands from the instruction definition, as well as any virtual operands created on the preceding lines.
Bit widths
Some operations (such as multiplication) will easily allow you to exceed the bit width of the virtual operand you're creating, or of the final output bytes. The game will cause an error when that occurs. Additionally, negative values are prone to being interpreted as 64-bit unsigned values (as of 0.1354 Beta) which can cause unexpected errors and odd-looking error messages. When in doubt, mask out the result of your expressions to ensure they fit within the intended bit width, especially when working with signed values.
Assertions
!assert %bits >> 63 == 1
Assertions can be intermixed with [#Virtual_Operands|virtual operands] and trigger an error if they are not true. Currently the only supported comparison operator is equals (==
), although more comparisons are planned in future. After the comparison a string can be added to produce a more context-specific error message.
The full syntax is:
!assert <lhs> == <rhs>
where <lhs>
and <rhs>
are both arbitrary expressions, using the same syntax as virtual operand assignment expressions.
Output bit pattern
11 000 ccc
The bit pattern comes after any virtual operands or assertions. It defines a sequence of bits that get interpreted as a single big-endian number (which then gets rearranged according to the [[#endianness|endianness] setting). In this line, space characters (and only space characters) are completely ignored.
Bits can be defined by either individual letters/numbers or explicit operand references.
Fixed patterns
0
and 1
produce those bits in the output and require those exact bits to be there while disassembling. ?
also produces 0
while assembling, but this bit will be ignored when disassembling (once that gets reimplemented).
Individual letters
All lowercase letters can be used as shortcuts to refer to operands. A letter to refers to the first operand that starts with that letter, with the order being left-to-right in the syntax line and then top-to-bottom for virtual operands.
If a letter appears multiple times, all of those appearances are assigned bits as if they were a single larger field. The bits are assigned LSB-to-MSB to the repeated letters in a right-to-left ordering. For example,
test_split_operand %c = 0b110001 c 0 ccccc 0
would result in 10100010
.
If you use an operand that doesn't support the full length your bit pattern defines, the value will be sign-extended for signed operands and zero-extended for unsigned operands:
test_extended_operands %s:S3 = 0b100 %u:U3 = 0b100 ssssssss uuuuuuuu
will result in 252 4
.
However, the opposite is not true - fields longer than their output format will not be truncated, but will result in an error. For example:
test_overrun %u:U8 = 32 uuuu0000
will result in the error Value 32 outside of range for this 4-bit zero-extended field
.
Explicit Operand References
00 %imm[5:0]
In addition to individual letters, operands can also be referenced by their full name using the %
prefix. In this case you also need to explicitly tell the assembler which bits to use, using a syntax similar to bit slicing:
%operand[start:end]
Unlike the arbitrary bit slicing available when creating virtual operands, the syntax here is much more restrictive: operand
must be a single operand name (instruction operand or virtual), and start
and end
must be unsigned integer literals.
As with the individual letter syntax, values shorter than the requested bits will be sign- or zero-extended appropriately:
test_extended_slicing %s:S3 = 0b100 %u:U3 = 0b100 %s[7:0] %u[7:0]
will result in 252 4
.
Unlike the single letter syntax, we do not have to worry about overruns with slicing syntax as we're specifying exactly which bits we want, regardless of whether there are more bits available or not.
Description
The instruction description is a single line containing any text. There are no syntax restrictions. Multi-line descriptions are not possible at this time.