Hello,
In this post, I am talking about how I am furthering my understanding of computers, so I can understand how I can properly optimize software. I am writing about learning how to write assembler code on the x86 and aarch64 platform for my software optimization class lab.
To complete this lab I performed the following tasks :
- Build and run the three C versions of the program for x86_64.
Take a look at the differences in the code. - Use the objdump -d command to dump (print) the object code (machine code) and disassemble it into assembler for each of the binaries. Find the section and take a look at the code. Notice the total amount of code.
- Review, build and run the x86_64 assembly language programs. Take a look at the code using objdump -d objectfile and compare it to the source code. Notice the absence of other code (compared to the C binary, which had a lot of extra code).
- Build and run the three C versions of the program for aarch64. Verify that you can disassemble the object code in the ELF binary using objdump -d objectfile and take a look at the code
- Review, build and run the aarch64 assembly language programs. Take a look at the code using objdump -d objectfile and compare it to the source code.
- Make a loop from 0 to 9, on x86 and aarch64
- Extend the code to loop from 00-30, printing each value as a 2-digit decimal number, on x86 and aarch64
How I used a Makefile
Since this lab required testing, reviewing, creating and running many files I decided to load everything into a Makefile.
In doing this I learned that I can call Makefiles in other folders.
The way I did that was by adding a target to the main Makefile and typing in “cd /route/to/makefile && make all”
In the attached folders you can see the Makefile I created.
Task 1
The three c programs all perform the same task of printing “Hello World!”, but they do it in 3 different ways.
Program 1: Uses printf()
Program 2: Uses write()
Program 3: Uses syscall()
Task 2
After Reviewing the output of the objdump I can see that program 1 uses the least amount of code at 8 lines but it is using printf which has the most overhead of the three functions. Program 2 using write which should have less overhead uses 12 lines of code. And finally program 3 also uses 12 lines of code and since we are using a syscall we have very little overhead.
Task 3
Yes, Since we are now compiling straight from assembler we don’t have the overhead of the c language. This cut the program down in size drastically now the how objdump file is only 11 lines of code.
Task 4
Here is the total line count the three c programs took to run on aarch64. Pretty similar results.
Program 1: 10 lines
Program 2: 12 lines
Program 3: 12 lines
Task 5
Surprisingly, the results are identical to the x86 in term of line count. The aarch64 Hello world program used 11 lines of code the same as x86.
Something interesting I noticed about the compiled code is that it transformed all the numbers to hexadecimal.
Task 6
Here is my loops 0-9 on x86 and aarch64.
/* x86 */
.text
.globl _start
start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov $start,%r15 /* loop index */
loop:
/* ... body of the loop ... do something useful here ... */
mov $len,%rdx
mov $48,%r14
add %r15,%r14
movb %r14b,msg+6
mov $msg,%rsi
mov $1,%rdi
mov $1,%rax
syscall
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
.data
msg: .ascii "Loop: \n"
len = . - msg
/* aarch64 */
.text
.globl _start
start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov x30,start /* loop index */
loop:
mov x19,48
mov x26,max
mov x27,1
adr x28,msg
add x19,x30,x19
strb w19,[x28,6]
ldr x1,=msg
mov x0,1
mov x2,len
mov x8, 64
svc 0
add x30,x27,x30 /* increment index */
cmp x26,x30 /* see if we're done */
b.ne loop /* loop if we're not */
mov x8,93 /* syscall sys_exit */
svc 0
.data
msg: .ascii "Loop: \n"
len = . - msg
Task 7
Here is my loops 0-30 with the leading zero’s removed on x86 and aarch64.
/* x86 */
.text
.globl _start
start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 31 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov $start,%r15 /* loop index */
loop:
/* ... body of the loop ... do something useful here ... */
mov $48,%r13
mov $48,%r14
mov $0,%rdx
mov %r15,%rax
mov $10,%r12
div %r12
add %rax,%r13
add %rdx,%r14
cmp $48,%r13 /*Compare*/
je continue
movb %r13b,msg+6
continue:
movb %r14b,msg+7
mov $msg,%rsi /*send message to reg rsi*/
mov $1,%rdi
mov $1,%rax
mov $len,%rdx
syscall
inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */
mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */
syscall
.data
msg: .ascii "Loop: \n"
len = . - msg
/* aarch64 */
.text
.globl _start
start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 30 /* loop exits when the index hits this number (loop condition is i<max) */
_start:
mov x30,start /* loop index */
loop:
mov x19,48
mov x20,48
mov x24,10
mov x25,48
mov x26,max
mov x27,1
adr x28,msg
udiv x21,x30,x24
msub x22,x21,x24,x30
add x19,x21,x19
add x20,x22,x20
cmp x25,x19
b.eq continue
strb w19,[x28,6]
continue:
strb w20,[x28,7]
ldr x1,=msg
mov x0,1
mov x2,len
mov x8, 64
svc 0
add x30,x27,x30 /* increment index */
cmp x26,x30 /* see if we're done */
b.ne loop /* loop if we're not */
mov x8,93 /* syscall sys_exit */
svc 0
.data
msg: .ascii "Loop: \n"
len = . - msg