as(9)
Command: as - assembler
AS----ASSEMBLER [IBM]
This document describes the language accepted by the 80386
assembler that is part of the Amsterdam Compiler Kit. Note that only
the syntax is described, only a few 386 instructions are shown as
examples.
Tokens, Numbers, Character Constants, and Strings
The syntax of numbers is the same as in C. The constants 32, 040,
and 0x20 all represent the same number, but are written in decimal,
octal, and hex, respectively. The rules for character constants and
strings are also the same as in C. For example, 'a' is a character
constant. A typical string is "string". Expressions may be formed with
C operators, but must use [ and ] for parentheses. (Normal parentheses
are claimed by the operand syntax.)
Symbols
Symbols contain letters and digits, as well as three special
characters: dot, tilde, and underscore. The first character may not be
a digit or tilde.
The names of the 80386 registers are reserved. These are:
al, bl, cl, dl
ah, bh, ch, dh
ax, bx, cx, dx, eax, ebx, ecx, edx
si, di, bp, sp, esi, edi, ebp, esp
cs, ds, ss, es, fs, gs
The xx and exx variants of the eight general registers are treated as
synonyms by the assembler. Normally "ax" is the 16-bit low half of the
32-bit "eax" register. The assembler determines if a 16 or 32 bit
operation is meant solely by looking at the instruction or the
instruction prefixes. It is however best to use the proper registers
when writing assembly to not confuse those who read the code.
The last group of 6 segment registers are used for selector + offset
mode addressing, in which the effective address is at a given offset in
one of the 6 segments.
Names of instructions and pseudo-ops are not reserved. Alphabetic
characters in opcodes and pseudo-ops must be in lower case.
Separators
Commas, blanks, and tabs are separators and can be interspersed
freely between tokens, but not within tokens. Commas are only legal
between operands.
Comments
The comment character is '!'. The rest of the line is ignored.
Opcodes
The opcodes are listed below. Notes: (1) Different names for the
same instruction are separated by '/'. (2) Square brackets ([])
indicate that 0 or 1 of the enclosed characters can be included. (3)
Curly brackets ({}) work similarly, except that one of the enclosed
characters must be included. Thus square brackets indicate an option,
whereas curly brackets indicate that a choice must be made.
Data Transfer
mov[b] dest, source ! Move word/byte from source to dest
pop dest ! Pop stack
push source ! Push stack
xchg[b] op1, op2 ! Exchange word/byte
xlat ! Translate
o16 ! Operate on a 16 bit object instead of 32 bit
Input/Output
in[b] source ! Input from source I/O port
in[b] ! Input from DX I/O port
out[b] dest ! Output to dest I/O port
out[b] ! Output to DX I/O port
Address Object
lds reg,source ! Load reg and DS from source
les reg,source ! Load reg and ES from source
lea reg,source ! Load effect address of source to reg and DS
{cdsefg}seg ! Specify seg register for next instruction
a16 ! Use 16 bit addressing mode instead of 32 bit
Flag Transfer
lahf ! Load AH from flag register
popf ! Pop flags
pushf ! Push flags
sahf ! Store AH in flag register
Addition
aaa ! Adjust result of BCD addition
add[b] dest,source ! Add
adc[b] dest,source ! Add with carry
daa ! Decimal Adjust after addition
inc[b] dest ! Increment by 1
Subtraction
aas ! Adjust result of BCD subtraction
sub[b] dest,source ! Subtract
sbb[b] dest,source ! Subtract with borrow from dest
das ! Decimal adjust after subtraction
dec[b] dest ! Decrement by one
neg[b] dest ! Negate
cmp[b] dest,source ! Compare
Multiplication
aam ! Adjust result of BCD multiply
imul[b] source ! Signed multiply
mul[b] source ! Unsigned multiply
Division
aad ! Adjust AX for BCD division
o16 cbw ! Sign extend AL into AH
o16 cwd ! Sign extend AX into DX
cwde ! Sign extend AX into EAX
cdq ! Sign extend EAX into EDX
idiv[b] source ! Signed divide
div[b] source ! Unsigned divide
Logical
and[b] dest,source ! Logical and
not[b] dest ! Logical not
or[b] dest,source ! Logical inclusive or
test[b] dest,source ! Logical test
xor[b] dest,source ! Logical exclusive or
Shift
sal[b]/shl[b] dest,CL ! Shift logical left
sar[b] dest,CL ! Shift arithmetic right
shr[b] dest,CL ! Shift logical right
Rotate
rcl[b] dest,CL ! Rotate left, with carry
rcr[b] dest,CL ! Rotate right, with carry
rol[b] dest,CL ! Rotate left
ror[b] dest,CL ! Rotate right
String Manipulation
cmps[b] ! Compare string element ds:esi with es:edi
lods[b] ! Load from ds:esi into AL, AX, or EAX
movs[b] ! Move from ds:esi to es:edi
rep ! Repeat next instruction until ECX=0
repe/repz ! Repeat next instruction until ECX=0 and ZF=1
repne/repnz ! Repeat next instruction until ECX!=0 and ZF=0
scas[b] ! Compare ds:esi with AL/AX/EAX
stos[b] ! Store AL/AX/EAX in es:edi
Control Transfer
As accepts a number of special jump opcodes that can assemble to
instructions with either a byte displacement, which can only reach to
targets within -126 to +129 bytes of the branch, or an instruction with
a 32-bit displacement. The assembler automatically chooses a byte or
word displacement instruction.
The English translation of the opcodes should be obvious, with
'l(ess)' and 'g(reater)' for signed comparisions, and 'b(elow)' and
'a(bove)*(CQ for unsigned comparisions. There are lots of synonyms to
allow you to write "jump if not that" instead of "jump if this".
The 'call', 'jmp', and 'ret' instructions can be either
intrasegment or intersegment. The intersegment versions are indicated
with the suffix 'f'.
Unconditional
jmp[f] dest ! jump to dest (8 or 32-bit displacement)
call[f] dest ! call procedure
ret[f] ! return from procedure
Conditional
ja/jnbe ! if above/not below or equal (unsigned)
jae/jnb/jnc ! if above or equal/not below/not carry (uns.)
jb/jnae/jc ! if not above nor equal/below/carry (unsigned)
jbe/jna ! if below or equal/not above (unsigned)
jg/jnle ! if greater/not less nor equal (signed)
jge/jnl ! if greater or equal/not less (signed)
jl/jnqe ! if less/not greater nor equal (signed)
jle/jgl ! if less or equal/not greater (signed)
je/jz ! if equal/zero
jne/jnz ! if not equal/not zero
jno ! if overflow not set
jo ! if overflow set
jnp/jpo ! if parity not set/parity odd
jp/jpe ! if parity set/parity even
jns ! if sign not set
js ! if sign set
Iteration Control
jcxz dest ! jump if ECX = 0
loop dest ! Decrement ECX and jump if CX != 0
loope/loopz dest ! Decrement ECX and jump if ECX = 0 and ZF = 1
loopne/loopnz dest ! Decrement ECX and jump if ECX != 0 and ZF = 0
Interrupt
int n ! Software interrupt n
into ! Interrupt if overflow set
iretd ! Return from interrupt
Flag Operations
clc ! Clear carry flag
cld ! Clear direction flag
cli ! Clear interrupt enable flag
cmc ! Complement carry flag
stc ! Set carry flag
std ! Set direction flag
sti ! Set interrupt enable flag
Location Counter
The special symbol '.' is the location counter and its value is the
address of the first byte of the instruction in which the symbol appears
and can be used in expressions.
Segments
There are four different assembly segments: text, rom, data and
bss. Segments are declared and selected by the .sect pseudo-op. It is
customary to declare all segments at the top of an assembly file like
this:
.sect .text; .sect .rom; .sect .data; .sect .bss
The assembler accepts up to 16 different segments, but MINIX expects
only four to be used. Anything can in principle be assembled into any
segment, but the MINIX bss segment may only contain uninitialized data.
Note that the '.' symbol refers to the location in the current segment.
Labels
There are two types: name and numeric. Name labels consist of a
name followed by a colon (:).
The numeric labels are single digits. The nearest 0: label may be
referenced as 0f in the forward direction, or 0b backwards.
Statement Syntax
Each line consists of a single statement. Blank or comment lines
are allowed.
Instruction Statements
The most general form of an instruction is
label: opcode operand1, operand2 ! comment
Expression Semantics
The following operators can be used: + - * / & | ^ ~ << (shift
left) >> (shift right) - (unary minus). 32-bit integer arithmetic is
used. Division produces a truncated quotient.
Addressing Modes
Below is a list of the addressing modes supported. Each one is
followed by an example.
constant mov eax, 123456
direct access mov eax, (counter)
register mov eax, esi
indirect mov eax, (esi)
base + disp. mov eax, 6(ebp)
scaled index mov eax, (4*esi)
base + index mov eax, (ebp)(2*esi)
base + index + disp. mov eax, 10(edi)(1*esi)
Any of the constants or symbols may be replacement by expressions.
Direct access, constants and displacements may be any type of
expression. A scaled index with scale 1 may be written without the
'1*'.
Call and Jmp
The 'call' and 'jmp' instructions can be interpreted as a load into
the instruction pointer.
call _routine ! Direct, intrasegment
call (subloc) ! Indirect, intrasegment
call 6(ebp) ! Indirect, intrasegment
call ebx ! Direct, intrasegment
call (ebx) ! Indirect, intrasegment
callf (subloc) ! Indirect, intersegment
callf seg:offs ! Direct, intersegment
Symbol Assigment
Symbols can acquire values in one of two ways. Using a symbol as a
label sets it to '.' for the current segment with type relocatable.
Alternative, a symbol may be given a name via an assignment of the form
symbol = expression
in which the symbol is assigned the value and type of its arguments.
Storage Allocation
Space can be reserved for bytes, words, and longs using pseudo-ops.
They take one or more operands, and for each generate a value whose size
is a byte, word (2 bytes) or long (4 bytes). For example:
.data1 2, 6 ! allocate 2 bytes initialized to 2 and 6
.data2 3, 0x10 ! allocate 2 words initialized to 3 and 16
.data4 010 ! allocate a longword initialized to 8
.space 40 ! allocates 40 bytes of zeros
allocates 50 (decimal) bytes of storage, initializing the first two
bytes to 2 and 6, the next two words to 3 and 16, then one longword with
value 8 (010 octal), last 40 bytes of zeros.
String Allocation
The pseudo-ops .ascii and .asciz take one string argument and
generate the ASCII character codes for the letters in the string. The
latter automatically terminates the string with a null (0) byte. For
example,
.ascii "hello"
.asciz "world\n"
Alignment
Sometimes it is necessary to force the next item to begin at a
word, longword or even a 16 byte address boundary. The .align pseudo-op
zero or more null byte if the current location is a multiple of the
argument of .align.
Segment Control
Every item assembled goes in one of the four segments: text, rom,
data, or bss. By using the .sect pseudo-op with argument .text, .rom,
.data or .bss, the programmer can force the next items to go in a
particular segment.
External Names
A symbol can be given global scope by including it in a .define
pseudo-op. Multiple names may be listed, separate by commas. It must
be used to export symbols defined in the current program. Names not
defined in the current program are treated as "undefined external"
automatically, although it is customary to make this explicit with the
.extern pseudo-op.
Common
The .comm pseudo-op declares storage that can be common to more
than one module. There are two arguments: a name and an absolute
expression giving the size in bytes of the area named by the symbol. The
type of the symbol becomes external. The statement can appear in any
segment. If you think this has something to do with FORTRAN, you are
right.
Examples
In the kernel directory, there are several assembly code files that
are worth inspecting as examples. However, note that these files, are
designed to first be run through the C preprocessor. (The very first
character is a # to signal this.) Thus they contain numerous constructs
that are not pure assembler. For true assembler examples, compile any C
program provided with MINIX using the -S flag. This will result in an
assembly language file with a suffix with the same name as the C source
file, but ending with the .s suffix.