close
2007年12月3日 星期一
Big Endian : byte order puts the byte whose address is "x...x000" at the most-significant position in the double word (the big end).
Such as : 0 1 2 3 4 5 6 7
--
.End.
====== Ch2 ======
2.2 Classifying instruction set architecture :
For C = A + B
1. Stack : Push A / Push B / Add / Pop C
2. Accumulator : Load A / Add B / Store C
For C = A + B
1. Stack : Push A / Push B / Add / Pop C
2. Accumulator : Load A / Add B / Store C
3. Reg-Mem : Load R1,A / Add R3,R1,B / Store R3,C
4. Reg-Reg(Load-Store) : Load R1,A / Load R2,B / Add R3,R1,R2 / Store R3,C
4. Reg-Reg(Load-Store) : Load R1,A / Load R2,B / Add R3,R1,R2 / Store R3,C
--- 2.3 Memory addressing :
Little Endian : byte order puts the byte whose address is "x...x000" at the least-significant position in the double word (the little end).
Such as : 7 6 5 4 3 2 1 0
Such as : 7 6 5 4 3 2 1 0
Big Endian : byte order puts the byte whose address is "x...x000" at the most-significant position in the double word (the big end).
Such as : 0 1 2 3 4 5 6 7
-- Addresses mode :
Register : Add R4,R3 / Regs[R4]←Regs[R4]+ Regs[R3]
Immediate : Add R4,#3 / Regs[R4]←Regs[R4]+3
Displacement : Add R4,100(R1) / Regs[R4]←Regs[R4] + Mem[100+Regs[R1]]
Register indirect : Add R4,(R1) / Regs[R4]←Regs[R4] + Mem[Regs[R1]]
Indexed : Add R3,(R1 + R2) / Regs[R3]←Regs[R3] +Mem[Regs[R1]+Regs[R2]]
Direct or absolute : Add R1,(1001) / Regs[R1]←Regs[R1] + Mem[1001]
Memory indirect : Add R1,@(R3) / Regs[R1]←Regs[R1] + Mem[Mem[Regs[R3]]]
Autoincrement : Add R1,(R2)+ / Regs[R1]←Regs[R1] + Mem[Regs[R2]] / Regs[R2]←Regs[R2]+d
Autodecrement : Add R1,–(R2) / Regs[R2]←Regs[R2]–d / Regs[R1]←Regs[R1] + Mem[Regs[R2]]
Scaled : Add R1,100(R2)[R3] / Regs[R1]← Regs[R1]+ Mem[100+Regs[R2] + Regs[R3]*d]
-- 2.8 SIMD :
Single-Instruction multiple-data 又稱 Vector Instructions ...
SISD是一筆指令,一筆資料=>一個結果,
在面對一串資料同時運算時就很沒效率,例如 C[8]=A[8]+B[8];
SIMD 則是單指令,多資料流,同樣的運算一次處理多筆資料,可節省處理時間。
Single-Instruction multiple-data 又稱 Vector Instructions ...
SISD是一筆指令,一筆資料=>一個結果,
在面對一串資料同時運算時就很沒效率,例如 C[8]=A[8]+B[8];
SIMD 則是單指令,多資料流,同樣的運算一次處理多筆資料,可節省處理時間。
-- paired-single operations :
Most graphics multimedia applications use 32-bit floating-point operations. Some computers double peak performance of single-precision, floating-point operations; they allow a single instruction to launch two 32-bit operations on operands found side-by-side in a double precision register. Just as in the prior case, the two partitions must be insulated to prevent operations on one half to affect the other. Such floating-point operations are called paired-single operations. For example, such an operation might be used to graphical transformations of vertices. This doubling in performance is typically accomplished by doubling the number of floating-point units, making it more expensive than just suppressing carries in integer adders.
Most graphics multimedia applications use 32-bit floating-point operations. Some computers double peak performance of single-precision, floating-point operations; they allow a single instruction to launch two 32-bit operations on operands found side-by-side in a double precision register. Just as in the prior case, the two partitions must be insulated to prevent operations on one half to affect the other. Such floating-point operations are called paired-single operations. For example, such an operation might be used to graphical transformations of vertices. This doubling in performance is typically accomplished by doubling the number of floating-point units, making it more expensive than just suppressing carries in integer adders.
-- DSP architectures use Saturating Arithmetic :
if the result is too large to be represented, it is set to the largest representable number, depending on the sign of the result. In contrast, two's complement arithmetic can add a small positive number to a large positive number and end up with a negative result. DSP algorithms rely on saturating arithmetic, and would be incorrect if run on a computer without it.
if the result is too large to be represented, it is set to the largest representable number, depending on the sign of the result. In contrast, two's complement arithmetic can add a small positive number to a large positive number and end up with a negative result. DSP algorithms rely on saturating arithmetic, and would be incorrect if run on a computer without it.
-- 2.9 Instruction for Control Flow (4 Types) :
1. Conditional branches
2. Jumps
3. Procedure call
4. Procedure return
2. Jumps
3. Procedure call
4. Procedure return
-- Register indirect jumps are useful for :
0. Permit any addressing mode to be used supply the target address
1. Case and Switch statements
2. Virtual functions or methods in OO languages like C++ or Java
2. Virtual functions or methods in OO languages like C++ or Java
3. Higher-order functions or function pointers in languages like C or C++
4. Dynamically shared libraries allow a library to loaded and linked at run time.
4. Dynamically shared libraries allow a library to loaded and linked at run time.
-- Procedure Invocation Options :
There are two basic conventions in use to save registers: either at the call site or inside the procedure being called.
Caller saving means that the calling procedure must save the registers that it wants preserved for access after the call, and
thus the called procedure need not worry about registers. Callee saving is the opposite: the called procedure must save the registers it wants to use, leaving the caller is unrestrained.
thus the called procedure need not worry about registers. Callee saving is the opposite: the called procedure must save the registers it wants to use, leaving the caller is unrestrained.
-- 2.10 Encoding an instruction set :
To balance several competing forces :
1. As many registers and addressing modes as possible
2. Average instruction size and hence average program size
2. Average instruction size and hence average program size
3. Instructions encoded into lengths that will be easy to handle in a pipelined implementation
-- 2.11 Graph coloring :
1. Register allocation algorithms
2. Construct a graph representing the possible candidates for allocation
2. Construct a graph representing the possible candidates for allocation
3. How to use a limited set of colors so that no two adjacent nodes in a dependency graph have the same order
4. Heuristic algorithms that work well in practice
-- Tranditional Vector computer added strided addressing and gather/scatter addressing to increase the number of programs that can be vectorized. Strided addressing skips a fixed number of words between each access, so sequential addressing is often called unit stride addressing. Gather and scatter find their addresses in another vector register: think of it as register indirect addressing for vector computers. From a vector perspective, in contrast these shortvector SIMD computers support only unit strided accesses: memory accesses load or store all elements at once from a single wide memory location. Since the data for multimedia applications are often streams that start and end in memory, strided and gather/scatter addressing modes such are essential to successful vectoization.
-- 2.12 MIPS emphasizes :
1. Simple load-store instruction set
2. Design forpipeline efficiency
3. Efficiency as a compiler target
2. Design forpipeline efficiency
3. Efficiency as a compiler target
-- Q2.18 Copy propagation for X = Y + Z; Y = X + Z ; W = Y - X;
一次只看一行程式碼
1. X = Y + Z ; 此處運算元是給定的,並不是由程式算出來的, 所以 Copy propagation 並不會轉換這道指令
2. Y = X + Z ; 此處 X 是計算出來的值,所以要用 X = Y + Z 轉換程式碼
= Y + Z + Z ; 這樣就沒有需要計算的運算元了
3. W = Y - X ; 兩個運算都需要計算
= (X + Z) - (Y+Z+Z) = -Z
Copy propagation將指令2的工作從一個加法變成兩個, 將指令3從減法變成變號運算, 這樣做可以節省工作量.
The suggest for optimizing compilers is : 如果要得到最好的編譯結果, 必須具備複雜的取捨分析能力來完成最佳化步驟.
-- Q3. Assume that values A, B, and C reside in memory.
instruction opcodes are 8 bits / memory addresses are 64 bits / register addresses are 6 bits
instruction opcodes are 8 bits / memory addresses are 64 bits / register addresses are 6 bits
For each of the architectures of Figure 2.2, how many addresses (or names) are in each instruction for the code to compute C=A+B and what is the total code size?
Stack Accumulator Reg-Memory Load/Store
Push A 1 Load A 1 Load R1,A 1 Load R1,A 1
Push B 1 Add B 1 Add R3,R1,R2 0 Load R2,B 1
Add 0 Store C 1 Store R3,C 1 Add R3,R1,R2 0
Pop C 1 Store R3,C 1
3 opcodes=24 4 opcodes=32
4 opcodes=32 3 opcodes=24 4 register ops=24 6 register ops = 36
3 addr. = 192 3 addr.=192 3 addr.=192 3 addr.=192
Code size = 224bits 216bits 240bits 260bits
--
.End.
全站熱搜
留言列表