Reserving English for terrible translation

1、Defining Data

1) Intrinsic Data Types

The assembler recognizes a basic set of intrinsic data types, which describe types in terms of their size (byte, word, doubleword, and so on), whether they are signed, and whether they are integers or reals. There’s a fair amount of overlap in these types—for example, the DWORD type (32-bit, unsigned integer) is interchangeable with the SDWORD type (32-bit, signed integer).You might say that programmers use SDWORD to communicate to readers that a value will contain a sign, but there is no enforcement by the assembler. The assembler only evaluates the sizes of operands.

2) Data Definition Statement

A data definition statement sets aside storage in memory for a variable, with an optional name.Data definition statements create variables based on intrinsic data types. A data definition has the following syntax:

 [name] directive initializer [,initializer]...

Intrinsic Data Types:

  • BYTE 8-bit unsigned integer. B stands for byte
  • SBYTE 8-bit signed integer. S stands for signed
  • WORD 16-bit unsigned integer
  • SWORD 16-bit signed integer
  • DWORD 32-bit unsigned integer. D stands for double
  • SDWORD 32-bit signed integer. SD stands for signed double
  • FWORD 48-bit integer (Far pointer in protected mode)
  • QWORD 64-bit integer. Q stands for quad
  • TBYTE 80-bit (10-byte) integer. T stands for Ten-byte
  • REAL4 32-bit (4-byte) IEEE short real
  • REAL8 64-bit (8-byte) IEEE long real
  • REAL10 80-bit (10-byte) IEEE extended real

This is an example of a data definition statement:

count DWORD 12345

Name: The optional name assigned to a variable must conform to the rules for identifiers

Directive: The directive in a data definition statement can be BYTE, WORD, DWORD, SBYTE, SWORD, or any of the types listed above. In addition, it can be any of the legacy data definition directives shown below

  • DB 8-bit integer
  • DW 16-bit integer
  • DD 32-bit integer or real
  • DQ 64-bit integer or real
  • DT define 80-bit (10-byte) integer

Initializer: At least one initializer is required in a data definition, even if it is zero. Additional initializers,if any, are separated by commas. For integer data types, initializer is an integer literal or integer expression matching the size of the variable’s type, such as BYTE or WORD. If you prefer to leave the variable uninitialized (assigned a random value), the ? symbol can be used as the initializer. All initializers, regardless of their format, are converted to binary data by the assembler. Initializers such as 00110010b, 32h, and 50d all have the same binary value.

3) Little-Endian Order

x86 processors store and retrieve data from memory using little-endian order (low to high).Some other computer systems use big-endian order (high to low),such as our network system. 

2、Symbolic Constants

A symbolic constant (or symbol definition) is created by associating an identifier (a symbol) with an integer expression or some text. Symbols do not reserve storage. They are used only by the assembler when scanning a program, and they cannot change at runtime.

1)  Equal-Sign Directive

The equal-sign directive associates a symbol name with an integer expression(I have tested it), The syntax is:

name = expression

Ordinarily, expression is a 32-bit integer value. When a program is assembled, all occurrences of name are replaced by expression during the assembler’s preprocessor step. Suppose the following statement occurs near the beginning of a source code file:

COUNT = 500

Further, suppose the following statement should be found in the file 10 lines later:

mov eax, COUNT

When the file is assembled, MASM will scan the source file and produce the corresponding code lines:

mov eax, 500

Current Location Counter: One of the most important symbols of all, shown as $, is called the current location counter. For example, the following declaration declares a variable named selfPtr and initializes it with the variable’s offset value:

selfPtr DWORD $

Redefinitions: A symbol defined with can be redefined within the same program. The following example shows how the assembler evaluates COUNT as it changes value:

COUNT = 5
mov al,COUNT ; AL = 5
COUNT = 10
mov al,COUNT ; AL = 10
COUNT = 100
mov al,COUNT ; AL = 100

The changing value of a symbol such as COUNT has nothing to do with the runtime execution order of statements. Instead, the symbol changes value according to the assembler’s sequential processing of the source code during the assembler’s preprocessing stage.

Calculating the Sizes of Arrays and Strings

When using an array, we usually like to know its size. The following example uses a constant named ListSize to declare the size of list:

list BYTE 10,20,30,40
ListSize = 4

Explicitly stating an array’s size can lead to a programming error, particularly if you should later insert or remove array elements. A better way to declare an array size is to let the assembler calculate its value for you. The $ operator (current location counter) returns the offset associated with the current program statement. In the following example, ListSize is calculated by subtracting the offset of list from the current location counter ($):

list BYTE 10,20,30,40
ListSize = ($ - list)

2) EQU Directive

The EQU directive associates a symbolic name with an integer expression or some arbitrary text.There are three formats:

name EQU expression
name EQU symbol
name EQU <text>

In the first format, expression must be a valid integer expression. In the second format, symbol is an existing symbol name, already defined with = or EQU. In the third format, any text may appear within the brackets <. . .>. When the assembler encounters name later in the program, it substitutes the integer value or text for the symbol.

EQU can be useful when defining a value that does not evaluate to an integer. A real number constant, for example, can be defined using EQU:

PI EQU <3.1416>

Example The following example associates a symbol with a character string. Then a variable can be created using the symbol:

pressKey EQU <"Press any key to continue...",0>
.
.
.data
prompt BYTE pressKey

Example Suppose we would like to define a symbol that counts the number of cells in a 10-by-10 integer matrix. We will define symbols two different ways, first as an integer expression and second as a text expression. The two symbols are then used in data definitions:

matrix1 EQU 10 * 10
matrix2 EQU <10 * 10>
.data
M1 WORD matrix1
M2 WORD matrix2

The assembler produces different data definitions for M1 and M2. The integer expression in matrix1 is evaluated and assigned to M1. On the other hand, the text in matrix2 is copied directly into the data definition for M2:

M1 WORD 100
M2 WORD 10 * 10

No Redefinition Unlike the = directive, a symbol defined with EQU cannot be redefined in the same source code file. This restriction prevents an existing symbol from being inadvertently assigned a new value.

3) TEXTEQU Directive

The TEXTEQU directive, similar to EQU, creates what is known as a text macro. There are three different formats: the first assigns text, the second assigns the contents of an existing text macro, and the third assigns a constant integer expression:

name TEXTEQU <text>
name TEXTEQU textmacro
name TEXTEQU %constExpr

Text macros can build on each other. In the next example, count is set to the value of an integer expression involving rowSize. Then the symbol move is defined as mov. Finally, setupAL is built from move and count:

rowSize = 5
count TEXTEQU %(rowSize * 2)
move TEXTEQU <mov>
setupAL TEXTEQU <move al,count>

Therefore, the statement:

setupAL

would be assembled as:

mov al,10

A symbol defined by TEXTEQU can be redefined at any time.

 

Reference: Assembly Language for x86 Processors Seventh Edition KIP R. IRVINE 

 

分类: 编程