Linux 32-bit Binary Exploitation – Assembly Basics Part I


Hello Everyone, Welcome to a New series of Binary Exploitation, this is the first part of binary exploitation, there are gonna be many more parts for this binary exploitation series. In this Series I will  start with Assembly Basics, required concepts for basic binary exploitation in layman terms. So, if you are new to either assembly or binary exploitation or buffer overflow – you are pretty much welcome here, because All the basics required for binary exploitation are explained in detail in this series. I am putting a lot of thought into this to make it as easy as possible and trying to cover most important and basic concepts required to learn assembly and start with binary exploitation. If you are interested in going directly to Binary Exploitation - here is the Part II Linux 32-bit binary exploitation.





This Series consists of 32-bit Assembly Basics, Concepts, Binary Exploitation, Buffer Overflow – Return to libc exploitation.

Contents:

1.   What is Assembly
2.   Why Assembly
3.   Decimal
4.   Binary
  • Binary to Decimal Conversion
  • Decimal to Binary Conversion
5.   Hexadecimal
  • Hexadecimal to Decimal Conversion
  • Decimal to Hexadecimal Conversion
6.   Segment & Offset
7.   Data Types in Assembly
8.   Registers
a.    General Purpose Registers
b.    Segment Registers
c.    Stack Registers
d.    Special Purpose Registers
9.   Structure of Assembly Program
10. Linux System Calls
11. Executing an Assembly Program
12. Writing a Hello World Program in Assembly language

Before Directly Jumping into Binary Exploitation, Some basics are important, lets hop on to them first.  

Binary Exploitation?

You need to understand the basics of assembly, Registers, Hex, Binary, Hexadecimal. I will explain about registers and assembly basics which are required for this tutorial. The binary I am going to exploit in this series is an intended vulnerable binary vulnerable to Buffer overflow – Return to libc attack. 

What is Assembly?


Assembly is a Low-level programming language. Programs written in assembly languages are compiled by an assembler. Every assembler has its own assembly language, which is designed for one specific computer architecture.

Why Assembly?

1.   If Something crashes on windows/linux – you will get a response it usually returns the location/action that caused the error, if you are to solve that error – knowing assembly is the only way to trouble shoot low level memory problems.
2.   If you need precise control over what your program is doing, a high-level language is never powerful enough to give you full security. 
3.   Even the most optimized high-level language compiler is still just a general compiler, thus the code it produces is also general/slow code.  If you have a specific task, it will run faster in optimized assembly than in any other language.
4.   I main reason would be the programming languages that you already know like python, java, c++ gives you limited functions, features but in assembly you are limited by the hardware you own only, you can play around with memory and CPU instructions to a great extent – which is pretty much fun.


Topics to Know Before Getting into Assembly

1)   Decimal:  The decimal system is a base 10 system, meaning that it consists of 10 numbers that are used to make all the numbers 0 -9. 

Example: Let’s take 275

Hundreds
Tens
Units
Digit
2
7
5
Explanation
2x10^2
7x10^1
5x10^0
Value
200
70
5
So, the output is 200+70+5 = 275.
Lets take any example of 3456

Thousands
Hundreds
Tens
Units
Digit
3
4
5
6
Explanation
3x10^3
4x10^2
5x10^1
6x10^0
Value
3000
400
50
6
So, the output is 3000+400+50+6 = 3456
Well, that’s how decimal system works. I guess you got no doubts regarding this. So, Let’s move onto the next one. 

2)   Binary: Binary system is a base 2 system. It consists of only (2 Values à 0,1) 0’s and 1’s because computer can understand only binary, so you should understand how a binary is being converted. 

Binary to Decimal Conversion:
You need to multiply the Binary value with “unit value with the power of (2)”.
Let’s take the binary value 11001 and convert it to decimal:


Total
1   x
2^4
16
1   x
2^3
8
0   x
2^2
0
0   x
2^1
0
1   x
2^0
1
Now add the total à 16+8+0+0+1 = 25.
25 is the decimal number of Binary numbers 11001. That’s how you do it. It might look complicated at first glance, but if you try it once, you will get it in an instant.

Decimal to Binary Conversion:  This is much easier than converting binary to decimal. All you need to do is take the remainder and paste it as it is in its unit’s place.
Let’s take the number 275
275/2 = 1
137/2 = 1
68/2 = 0
34/2 = 0
17/2 = 1
8/2 = 0
4/2 = 0
2/2 = 0
0/2 =1 
So, the binary value of Decimal 275 is 100010011.  
 

Points to Note:

·   Divide the original number by 2, if it divides evenly the remainder is 0, or else 1
·   Repeat until you get 0
·   Usually 1 represents TRUE, and 0 FALSE
· 001001100 is equal to 1001100, the zero’s in at the start of the value represent nothing – you can leave them alone XD

3)  Hexadecimal: Hexadecimal is base 16 system. Everything related to memory is a multiple of 4, for example memory allocation starts with 8 bits,8bytes,16 bytes,32,64,128,256,512 and so on.  Since hexadecimal is a base 16 system – it’s perfect for computers to use hexadecimal. Also, Hex is nothing but hexadecimal, Hex is the short form for hexadecimal, so if don’t think they are different. 

You need to remember these before getting into hexadecimal conversion
Hex
Decimal
Binary
0
0
0
1
1
1
2
2
10
3
3
11
4
4
100
5
5
101
6
6
110
7
7
111
8
8
1000
9
9
1001
A
10
1010
B
11
1011
C
12
1100
D
13
1101
E
14
1110
F
15
1111

Hexadecimal to Decimal Conversion: 
Lets take “D80” Hexadecimal value as an example to convert D80 into Decimal value.
D80
16^Unit’s Position
Decimal to Hex Value *16^ units’ position
Total
D x
16^2
13 x 256
3328
8 x
16^1
8 x 16
128
0 x
16^0
0 x 1
0
So, the total is 3328+128+0 = 3456.  So, D80 is the hexadecimal value for 3456 Decimal number.
Decimal to Hexadecimal Conversion:
 Let’s take the Decimal value 3456 and convert it to Hexadecimal, you should always go in Little Endian format (Reverse Order)

3456/16 = 216
216*16 = 3456
3456-3456 =0. So, the Hexadecimal value for Decimal 0 is 0

216/16 = 13
13*16 = 208
216-208 = 8. So, the Hexadecimal value for Decimal 8 is 8

13/16 = 0
0*16 = 0
13-0 = 13. So, the Hexadecimal Value for Decimal 13 is D

Finally, the D80 is the hexadecimal value of Decimal value 3456. Hope you understood this, if not you can drop a comment below. 

Note:
  • Hex = Hexadecimal
  • In windows environments, hex is mostly represented as 0D80
  • In Unix Environments, hex is represented as 0xD80

Segment & Offset:
Everything on your computer is connected through a series of wires called the BUS.  The BUS to the RAM is 16 bits.  So, when the processor needs to write to the RAM, it does so by sending the 16-bit location through the bus.  In the old days this meant that computers could only have 65535 bytes of memory (16 bits = 1111111111111111 = 65535).  
That was plenty back then, but today that's not quite enough.  So, designers came up with a way to send 20 bits over the bus, thus allowing for a total of 1 MB of memory. 
Memory is segmented into a collection of bytes called Segments and can be access by specifying the Offset number within those segments.  So, whenever the processor wants to access data, it first sends the Segment number, followed by the Offset number.  

Before you get into assembly programming, you need to understand the data types & registers in assembly. Registers are the most important things in assembly. without registers - there is no memory allocations and processing. For that reason i will explain the data types in assembly, List out the types of registes with a very brief explanation.

Let’s get into Assembly Basics now,

Bits are the smallest unit of data on a computer. Each bit can only represent 2 numbers, 1 and 0.  Bits are useless because they're so damn small, so we got the nibble. A nibble is a collection of 4 bits. The most important data structure used by your computer is a Byte.  A byte is the smallest unit that can be accessed by your processor.  It is made up of 8 bits, or 2 nibbles. A word is simply 2 bytes, or 16 bits. Originally a Word was the size of the BUS from the CPU to the RAM.  Today most computers have at least a 32bit bus but, most people were used to 1 word = 16 bits, so they decided to keep it that way.

Data Types in Assembly:

Byte
8 Bits
Word
16 bits (2 Bytes)
Double Word (Dword)
32 Bits (2 Words)
Quad Word (Qword)
64 Bits (2 Dwords)

 

Registers in Assembly:

A processor contains small areas that can store data.  They are too small to store files, instead they are used to store information while the program is running.



Registers can be divided into following categories: 

1)   General Purpose Registers:  All general-purpose registers are 16 bit and can be broken up into two 8-bit registers.  For example, AX can be broken up into AL and AH.
·          AX – Accumulator:
  • Made up of: AH, AL
  • Common uses: Math operations, I/O operations, INT 21
·         BX – Base:
  • Made up of: BH, BL
  • Common uses: Base or Pointer
·         CX – Counter
  • Made up of: CH, CL
  • Common uses: Loops and Repeats
·         DX – Displacement
  • Made up of: DH, DL
  • Common uses: Various data, character output
 
When the x86 came out it added 4 new registers to that category: EAX, EBX, ECX, and EDX. The E stands for Extended, and that's just what they are, 32bit extensions to the originals. 

2)  Segment Registers:
CS - Code Segment.  The memory block that stores code
DS - Data Segment.  The memory block that stores data
ES - Extra Segment.  Commonly used for video stuff
SS - Stack Segment.  Register used by the processor to store return addresses from routines

3) Stack Registers:
  • BP - Base pointer.  Used in conjunction with SP for stack operations
  • SP - Stack Pointer. 
  •  
4)  Special Purpose Registers:
IP - Instruction Pointer.  Holds the offset of the instruction being executed
Flags - These are a bit different from all other registers.  A flag register is only 1 bit in size.  It's either 1 (true), or 0 (false).  There are several flag registers including the Carry flag, Overflow flag, Parity flag, Direction flag, and more.  You don't assign numbers to these manually.  The value automatically set depending on the previous instruction.  

Memory Segments in Assembly:

text = assembly instructions are stored

data & bss = to store variables

heap = Location of memory where you can store and manipulate data dynamically using some programming language

Stack = managed by the compiler, it is at the bottom

Structure of Assembly Program:

.data --> all initialized data -- Strings
.bss --> all un-initialized data
.text --> Program instructions   -- Executable code
          .global _start          --> External callable routines; Libraries 
               
                 _start                   --> start of a program; Main() routine


Data Types in .DATA segment

.byte = 1 byte
.ascii = string
.asciz = Null Terminated String
.int = 32-bit integer
.short = 16-bit integer
.float = single precision floating point number
.double = double precision floating point number


Data types in .BSS Segment

.comm -- declares common memory area
.lcomm - declares local common memory area

Space created at Runtime; whatever you define here is not going to occupy any space inside the executable which shall be created using assembler and linker.


Linux System Calls:

The Next important concept  required to understand 32-bit assembly in Linux is Linux System calls, these are used to make requests for any user to get some output. 

Before you start programming assembly, you need to understand how Linux system calls works as we will be using them a lot. We can use these system calls to execute commands, functions. In Assembly programming Sys calls can be used with libraries which can make requests to kernel modules and get the required output. Sys calls are helpful in buffer overflow exploitations as well.
Examples: exit(), read(), write() etc. 

Arguments to syscall: whenever you are going to invoke a Linux system call, you need to load appropriate registers with appropriate arguments which system call will require.
EAX - System call number
EBX - First Argument
ECX - Second Argument
EDX - Third Argument
ESI -  Fourth Argument
EDI -  Fifth Argument
for calls which require more than 5 arguments, we pass a pointer to the structure containing arguments.
  
System calls are invoked by processes using a process interrupt - INT 0x80
when interrupt is invoked kernel calls the system call interrupt handler which takes all arguments and does required based on system call number.

Assembly Program to Execute System call:

.text
.global _start

_start:

      movl $1, %eax
      movl $0, %ebx
      int $0x80
         

Defining a system call:

exit(0) --> is the sys call used to exit a program, Explanation for the above program. 

1. sys call number for exit() is 1, so load EAX with 1, mov instruction load the value 1 into eax register(%eax)

movl $1, %eax    
2. "Status" is lets say "0" - EBX must be loaded with 0
movl $0, %ebx   
3. Calling the syscall - Raising the interrupt 0x80
int 0x80

Executing an Assembly Program:


1) An unix architecture Assembly program in most of the cases should be saved with an extension ".s". So, always save your assembly program with an extension of .s
2) You need to create an Object file and Compiling the program using gnu assembler
as -o program.o program.s 
3) Use linker to make it into an executable
ld -o program program.o

 

Writing a Hello World Program in Assembly language:


Writing a hello world program in assembly is not as easy as in other programming languages. You need to get a good understanding of system call functions write and exit. So, let me explain the how to write a Hello World program in Assembly.

Step 1: write() syscall to print the "Hello world" message
Step 2: use exit() to exit the program
So, how to you write some data in assembly? You need to understand the underlying functions used by write function and syscall. 

We need to follow this to write data in Assembly


write() takes 3 arguments:
·         file descriptor in which it needs to write,
·         buffer - where the actual data written is to be stored
·         Count: number of bytes - which needs to be written in the beginning

So, how do we achieve this?

There file descriptor numbers for all standard streams – there are 3 standard streams in total. Standard input, Standard output, Standard error. In the same way sys call number for write() is 4.
Here are the commonly used file descriptor numbers
  1)   stdin, file descriptor 0
  2)   stdout, file descriptor 1
  3)   stderr, file descriptor 2

Explanation

As Explained above write() takes a syscall, file descriptor, buffer and count. All These 4 should be passed for successful execution. 

1)   We need to call write syscall to write something. Sys call number for write() is 4, Store ‘4’ in EAX
2)   After writing some data, as we need to output the info, we need to use “STDOUT”. The File descriptor for STDOUT is “1”. So, store 1 in EBX
3)   The data to be written is the buffer, So Buf = pointer to a memory location containing "Hello World" String. Store “Hello World” in ECX
4)   Size of the string should be given as a Count, So, pass “11 which is the size of “Hello World” (including space) in EDX.




Hello World Program in Assembly:

.data

HelloWorldString:
       
         .ascii "Hello World"
.text
.global _start
_start:
       
#load all the arguments for the write
       
       movl $4, %eax       
       movl $1, %ebx       

       movl $HelloWorldString, %ecx       

       movl $11, %edx       

       int $0x80        

#Need to exit the syscall       

       movl $1, %eax       

       movl $0, %ebx       

       int $0x80

Executing:

Save the file as hello.s 
as -o hello.o hello.s 
ld -o hello hello.o 
./hello          

That's it, for this post. As we are done with least of the basics, I think you can at least get a vague idea of what is going on in the debugger if you read this whole article. In the next post of this series, I will explain Linux 32-bit binary with an example. So, stay tuned and if you have any feedback – please comment below.



==================       HACKING DREAM      ===================

Main Principle of My Hacking Dream is to Promote Hacking Tricks and Tips to All the People in the World, So That Everyone will be Aware of Hacking and protect themselves from Getting Hacked. Hacking Don’t Need Agreements.

I Will Be Very Happy To Help You, So For Queries or Any Problem Comment Below or You Can Send out a Mail At Bhanu@HackingDream.net




Bhanu Namikaze

Bhanu Namikaze is an Ethical Hacker, Security Analyst, Blogger, Web Developer and a Mechanical Engineer. He Enjoys writing articles, Blogging, Debugging Errors and Capture the Flags. Enjoy Learning; There is Nothing Like Absolute Defeat - Try and try until you Succeed.

No comments:

Post a Comment