Linking an ELF Relocatable Executable

I have been investigating recently how to use GNU LD to make an ELF Relocatable Executable.  Apparently the very term is slightly controversial as the strictest interpretation of the ELF specification says that such a thing doesn't exist.

From a pragmatic standpoint, such a file should be well-defined and readily useful.  Any executable to be loaded into a flat memory model (i.e. a non-virtual memory system, such as DOS, AmigaOS, etc.) or an executable designed to be loaded into an address space with other executables (such as kernel modules, drivers, etc.) would have to be relocatable.

What do I mean by "executable"?  As far as Atomicity goes at present, for the file to be executable it needs to have no unresolved symbols and has to have a defined entry point.  To have no unresolved symbols, I may be able to invoke a linker flag to mandate this, but for the moment it is sufficient to just write the code correctly.  To have an entry point, the "entry" field of the ELF File Header has to be populated with a meaningful value, OR the symbol table has to be present with the address of a symbol such as "_start".

What do I mean by "relocatable"?  When the linker creates the final file, it doesn't know where in memory the file will be loaded, so it includes what is called a relocation table.  this table contains references to locations within the executable along with information on how those memory locations need to be updated with the correct address when the file is loaded.

When I invoke ld with no special options, it appears to choose an arbitrary memory location, and links the executable as if it is to be loaded at that memory address.  It includes the entry point address, but no relocation table.. I've been trying to find a combination of runes to pass to ld to get it to create something which can approximate an ELF relocatable executable.

From a quick peruse of the internet, two options keep cropping up as potentials; --relocatable and --emit-relocs.  the first of these appears to provide a useful relocation table, but excludes the entry point.  The second of these has a relocation table and an entry point, but the executable is still linked for an arbitrary memory address.

Let's use objdump to examine each of these in detail.

Source Code


This is our test program.  It's been designed to have relocations from the TEXT (code) segment to the DATA segment and back again.

char lData[20] = "ABCDEFG";
void (*lMain)();

void main()
{
 asm("int $0x50" : : "a" (lData) );
}

void _start()
{
 lMain = main;
 lMain();
}

Test 1 : Default LD


We compile this with gcc, and link it with ld as normal.
gcc -c main.c -o main.o
ld -o helloworld main.o

Then we use objdump to examine the relevant parts of the file.
objdump helloworld -fhtrS

helloworld:     file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08048080

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000025  08048074  08048074  00000074  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .eh_frame     00000058  0804809c  0804809c  0000009c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .data         00000014  080490f4  080490f4  000000f4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000004  08049108  08049108  00000108  2**2
                  ALLOC
  4 .comment      00000011  00000000  00000000  00000108  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
08048074 l    d  .text  00000000 .text
0804809c l    d  .eh_frame      00000000 .eh_frame
080490f4 l    d  .data  00000000 .data
08049108 l    d  .bss   00000000 .bss
00000000 l    d  .comment       00000000 .comment
00000000 l    df *ABS*  00000000 main.c
00000000 l    df *ABS*  00000000
080490f4 g     O .data  00000014 lData
08049108 g     O .bss   00000004 lMain
08048080 g     F .text  00000019 _start
08049108 g       .bss   00000000 __bss_start
08048074 g     F .text  0000000c main
08049108 g       .data  00000000 _edata
0804910c g       .bss   00000000 _end



Disassembly of section .text:

08048074 
: 8048074: 55 push %ebp 8048075: 89 e5 mov %esp,%ebp 8048077: b8 f4 90 04 08 mov $0x80490f4,%eax 804807c: cd 50 int $0x50 804807e: 5d pop %ebp 804807f: c3 ret 08048080 <_start>: 8048080: 55 push %ebp 8048081: 89 e5 mov %esp,%ebp 8048083: 83 ec 08 sub $0x8,%esp 8048086: c7 05 08 91 04 08 74 movl $0x8048074,0x8049108 804808d: 80 04 08 8048090: a1 08 91 04 08 mov 0x8049108,%eax 8048095: ff d0 call *%eax 8048097: c9 leave 8048098: c3 ret


As we can see, all the symbols are all resolved, the file has a start address ... except the files is linked for a specific address (0x8048074 for the .text section) and contains no relocations, so that's no good.

Test 2 : --relocatable


Let's try the first of our options, --relocatable.
ld -o helloworld main.o --relocatable
objdump helloworld -fhtrS

helloworld:     file format elf32-i386
architecture: i386, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000025  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .eh_frame     00000058  00000000  00000000  0000005c  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  2 .data         00000014  00000000  00000000  000000b4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000000  00000000  00000000  000000c8  2**2
                  ALLOC
  4 .comment      00000012  00000000  00000000  000000c8  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
00000000 l    d  .text  00000000 .text
00000000 l    d  .eh_frame      00000000 .eh_frame
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l    d  .comment       00000000 .comment
00000000 l    df *ABS*  00000000 main.c
00000000 l    df *ABS*  00000000
00000000 g     O .data  00000014 lData
00000004       O *COM*  00000004 lMain
0000000c g     F .text  00000019 _start
00000000 g     F .text  0000000c main



Disassembly of section .text:

00000000 
: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: b8 00 00 00 00 mov $0x0,%eax 4: R_386_32 lData 8: cd 50 int $0x50 a: 5d pop %ebp b: c3 ret 0000000c <_start>: c: 55 push %ebp d: 89 e5 mov %esp,%ebp f: 83 ec 08 sub $0x8,%esp 12: c7 05 00 00 00 00 00 movl $0x0,0x0 19: 00 00 00 14: R_386_32 lMain 18: R_386_32 main 1c: a1 00 00 00 00 mov 0x0,%eax 1d: R_386_32 lMain 21: ff d0 call *%eax 23: c9 leave 24: c3 ret

This "partial link" has no start address, but does contains the _start symbol with a relative offset from the beginning of the .text section, so we could use this, but it's not ideal.  It hasn't been linked to an arbitrary address, and includes relocation information (displayed inline in the disassembly).  Incidentally, adding an explicit "-entry=_start" to the commadn to tell ld which is the start symbol had no effect.

Test 3 : --emit-relocs


We'll try the next option.
ld -o helloworld main.o --emit-relocs
objdump helloworld -fhtrS

helloworld:     file format elf32-i386
architecture: i386, flags 0x00000113:
HAS_RELOC, EXEC_P, HAS_SYMS, D_PAGED
start address 0x08048080

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000025  08048074  08048074  00000074  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .eh_frame     00000058  0804809c  0804809c  0000009c  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  2 .data         00000014  080490f4  080490f4  000000f4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000004  08049108  08049108  00000108  2**2
                  ALLOC
  4 .comment      00000011  00000000  00000000  00000108  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
08048074 l    d  .text  00000000 .text
0804809c l    d  .eh_frame      00000000 .eh_frame
080490f4 l    d  .data  00000000 .data
08049108 l    d  .bss   00000000 .bss
00000000 l    d  .comment       00000000 .comment
00000000 l    df *ABS*  00000000 main.c
00000000 l    df *ABS*  00000000
080490f4 g     O .data  00000014 lData
08049108 g     O .bss   00000004 lMain
08048080 g     F .text  00000019 _start
08049108 g       .bss   00000000 __bss_start
08048074 g     F .text  0000000c main
08049108 g       .data  00000000 _edata
0804910c g       .bss   00000000 _end



Disassembly of section .text:

08048074 
: 8048074: 55 push %ebp 8048075: 89 e5 mov %esp,%ebp 8048077: b8 f4 90 04 08 mov $0x80490f4,%eax 8048078: R_386_32 lData 804807c: cd 50 int $0x50 804807e: 5d pop %ebp 804807f: c3 ret 08048080 <_start>: 8048080: 55 push %ebp 8048081: 89 e5 mov %esp,%ebp 8048083: 83 ec 08 sub $0x8,%esp 8048086: c7 05 08 91 04 08 74 movl $0x8048074,0x8049108 804808d: 80 04 08 8048088: R_386_32 lMain 804808c: R_386_32 main 8048090: a1 08 91 04 08 mov 0x8049108,%eax 8048091: R_386_32 lMain 8048095: ff d0 call *%eax 8048097: c9 leave 8048098: c3 ret

Ok, so the executable has a start address, and has relocation entries, but it has still been linked to an arbitrary address.  This may work, but I'd have to investigate further to see what effect linking to an address has had, and whether the relocations contain enough information to "undo" that linking and successfully load the executable to a different address.

Conclusion


In the absence of any other mysterious options to coerce ld to do what I want, it seems that both of these options can potentially resolve the issue, but both feel hacky in their implementation.  I either have find the start address by a symbol rather than the "start address" field, or I have to undo some of what the linker has done in order to load the file.

I'll probably implement one of these for the time being and modify the ELF loader to account for it, all the while wondering if I'm missing something obvious.

No comments:

Post a Comment