Editing Basic Assembly Programming (section)

== What Is Assembly? ==

Assembly is a human-palatable interface to a specific computer processor. ''Every processor has its own distinct assembly language''. Knowing “how to program in assembly” means you can probably pick up a different assembly language pretty easily, but it doesn’t mean that you are intimately familiar with the details and specifics of any one in particular. For example, I personally am very familiar with the MIPS IV 32-bit processor, but my knowledge about writing efficient programs on that processor would not get me the whole distance to writing efficient programs on the 64-bit version. Each processor is different.

The reason is because every processor understands a different ''instruction set''. We call this the '''ISA''', or ''instruction set architecture''. A processor looks at instructions, “decodes” them into control signals, and then executes them. The processor sees the instructions in binary form, just ones and zeroes on wires. As humans, it’s very difficult to program this way. It’s error prone, and very difficult to bugfix (see discussion on labels). So we make '''mnemonics''', short names that stand in for the binary representation of an instruction. For example, in the OVERTURE architecture, the instruction <code>0b00xx xxxx</code> might be given a mnemonic such as <code>load_immediate</code>, <code>load_imm</code>, <code>loadi</code>, <code>ldi</code>, or <code>li</code> (note that assemblers are typically case-insensitive, but currently Turing Complete’s assembler is case-sensitive). This instruction loads the value of the bottom 6 bits, called the '''immediate value''', directly into register 0. In a typical assembler, we would write <code>loadi 55</code> to represent the bit-pattern <code>0b0011 0111</code>. We call <code>loadi</code> the '''opcode''' field, and <code>55</code> the '''operand''' field. Different instructions may have different numbers of operands; for example the MIPS <code>add</code> instruction takes 3: the register to store the result, and 2 registers whose values should be added.

Turing Complete’s assembler currently assumes any spaces separate instruction bytes, so we have to combine the different fields using constant operators like <code>|</code> (bitwise-or). For example, if we have set the <code>loadi</code> mnemonic in the left pane of the assembly editor, we could write <code>loadi|55</code>.

<span id="types-of-isas"></span>
===== Types of ISAs =====

There are two major classifications of ISAs. a '''RISC''' architecture is a '''Reduced Instruction Set Computer'''. It tries to offer a consistent binary instruction format (which is easier to build hardware for), at the cost of expressiveness. A RISC architecture, for example, usually has the same number of bytes in every instruction. OVERTURE is a RISC architecture with 1-byte instructions, and LEG is a RISC architecture with 4-byte instructions. OVERTURE pays the price for this by making it impossible to encode immediates values over 63. LEG pays the price by having large numbers of instructions which require 4 bytes but do not use all 4 fields, making it hard to fit programs into the 256 byte space.

The other type of architecture is a '''CISC''' architecture, or a '''Complex Instruction Set Computer'''. CISCs try to offer expressiveness and convenience at the cost of hardware complexity. Most modern processors are CISC architectures, including the x86 processor that you are most likely using to view this page. A CISC architecture usually has '''VLI'''s, or ''variable length instructions''. This can enable simple operations, like addition, to fit in a single byte, while something like <code>loadi</code> can have a second byte for the immediate, allowing values up to 255. Turing Complete does not currently have levels which build a CISC architecture, but several people have done it in the sandbox. Feel free to ask around in the server.

Different ISAs can have wildly varying sets of instructions. Some RISC architectures have huge numbers of simple instructions, and some CISC architectures have small numbers of very complex instructions. There may be little or no overlap. But that doesn’t mean one architecture can do something that another cannot. Any Turing Complete architecture has exactly the same computational power as any other. If a processor does not have an instruction to do byte-NOT on the value in a register, then the processor cannot perform that operation ''in a single step''. The key to assembly programming is figuring out how to decompose such operations into smaller steps which the processor ''can'' perform.

<span id="getting-started"></span>