From a pragmatic standpoint, such a file should be well-defined and readily useful. Any executable to be loaded into a flat memory model (i.e. a non-virtual memory system, such as DOS, AmigaOS, etc.) or an executable designed to be loaded into an address space with other executables (such as kernel modules, drivers, etc.) would have to be relocatable.
What do I mean by "executable"? As far as Atomicity goes at present, for the file to be executable it needs to have no unresolved symbols and has to have a defined entry point. To have no unresolved symbols, I may be able to invoke a linker flag to mandate this, but for the moment it is sufficient to just write the code correctly. To have an entry point, the "entry" field of the ELF File Header has to be populated with a meaningful value, OR the symbol table has to be present with the address of a symbol such as "_start".
What do I mean by "relocatable"? When the linker creates the final file, it doesn't know where in memory the file will be loaded, so it includes what is called a relocation table. this table contains references to locations within the executable along with information on how those memory locations need to be updated with the correct address when the file is loaded.
When I invoke ld with no special options, it appears to choose an arbitrary memory location, and links the executable as if it is to be loaded at that memory address. It includes the entry point address, but no relocation table.. I've been trying to find a combination of runes to pass to ld to get it to create something which can approximate an ELF relocatable executable.
From a quick peruse of the internet, two options keep cropping up as potentials; --relocatable and --emit-relocs. the first of these appears to provide a useful relocation table, but excludes the entry point. The second of these has a relocation table and an entry point, but the executable is still linked for an arbitrary memory address.
Let's use objdump to examine each of these in detail.
Source Code
This is our test program. It's been designed to have relocations from the TEXT (code) segment to the DATA segment and back again.
char lData[20] = "ABCDEFG"; void (*lMain)(); void main() { asm("int $0x50" : : "a" (lData) ); } void _start() { lMain = main; lMain(); }
Test 1 : Default LD
We compile this with gcc, and link it with ld as normal.
gcc -c main.c -o main.o ld -o helloworld main.o
Then we use objdump to examine the relevant parts of the file.
objdump helloworld -fhtrS
helloworld: file format elf32-i386 architecture: i386, flags 0x00000112: EXEC_P, HAS_SYMS, D_PAGED start address 0x08048080 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000025 08048074 08048074 00000074 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .eh_frame 00000058 0804809c 0804809c 0000009c 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .data 00000014 080490f4 080490f4 000000f4 2**2 CONTENTS, ALLOC, LOAD, DATA 3 .bss 00000004 08049108 08049108 00000108 2**2 ALLOC 4 .comment 00000011 00000000 00000000 00000108 2**0 CONTENTS, READONLY SYMBOL TABLE: 08048074 l d .text 00000000 .text 0804809c l d .eh_frame 00000000 .eh_frame 080490f4 l d .data 00000000 .data 08049108 l d .bss 00000000 .bss 00000000 l d .comment 00000000 .comment 00000000 l df *ABS* 00000000 main.c 00000000 l df *ABS* 00000000 080490f4 g O .data 00000014 lData 08049108 g O .bss 00000004 lMain 08048080 g F .text 00000019 _start 08049108 g .bss 00000000 __bss_start 08048074 g F .text 0000000c main 08049108 g .data 00000000 _edata 0804910c g .bss 00000000 _end Disassembly of section .text: 08048074: 8048074: 55 push %ebp 8048075: 89 e5 mov %esp,%ebp 8048077: b8 f4 90 04 08 mov $0x80490f4,%eax 804807c: cd 50 int $0x50 804807e: 5d pop %ebp 804807f: c3 ret 08048080 <_start>: 8048080: 55 push %ebp 8048081: 89 e5 mov %esp,%ebp 8048083: 83 ec 08 sub $0x8,%esp 8048086: c7 05 08 91 04 08 74 movl $0x8048074,0x8049108 804808d: 80 04 08 8048090: a1 08 91 04 08 mov 0x8049108,%eax 8048095: ff d0 call *%eax 8048097: c9 leave 8048098: c3 ret
As we can see, all the symbols are all resolved, the file has a start address ... except the files is linked for a specific address (0x8048074 for the .text section) and contains no relocations, so that's no good.
Test 2 : --relocatable
Let's try the first of our options, --relocatable.
ld -o helloworld main.o --relocatable objdump helloworld -fhtrS
helloworld: file format elf32-i386 architecture: i386, flags 0x00000011: HAS_RELOC, HAS_SYMS start address 0x00000000 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000025 00000000 00000000 00000034 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .eh_frame 00000058 00000000 00000000 0000005c 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 2 .data 00000014 00000000 00000000 000000b4 2**2 CONTENTS, ALLOC, LOAD, DATA 3 .bss 00000000 00000000 00000000 000000c8 2**2 ALLOC 4 .comment 00000012 00000000 00000000 000000c8 2**0 CONTENTS, READONLY SYMBOL TABLE: 00000000 l d .text 00000000 .text 00000000 l d .eh_frame 00000000 .eh_frame 00000000 l d .data 00000000 .data 00000000 l d .bss 00000000 .bss 00000000 l d .comment 00000000 .comment 00000000 l df *ABS* 00000000 main.c 00000000 l df *ABS* 00000000 00000000 g O .data 00000014 lData 00000004 O *COM* 00000004 lMain 0000000c g F .text 00000019 _start 00000000 g F .text 0000000c main Disassembly of section .text: 00000000: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: b8 00 00 00 00 mov $0x0,%eax 4: R_386_32 lData 8: cd 50 int $0x50 a: 5d pop %ebp b: c3 ret 0000000c <_start>: c: 55 push %ebp d: 89 e5 mov %esp,%ebp f: 83 ec 08 sub $0x8,%esp 12: c7 05 00 00 00 00 00 movl $0x0,0x0 19: 00 00 00 14: R_386_32 lMain 18: R_386_32 main 1c: a1 00 00 00 00 mov 0x0,%eax 1d: R_386_32 lMain 21: ff d0 call *%eax 23: c9 leave 24: c3 ret
This "partial link" has no start address, but does contains the _start symbol with a relative offset from the beginning of the .text section, so we could use this, but it's not ideal. It hasn't been linked to an arbitrary address, and includes relocation information (displayed inline in the disassembly). Incidentally, adding an explicit "-entry=_start" to the commadn to tell ld which is the start symbol had no effect.
Test 3 : --emit-relocs
We'll try the next option.
ld -o helloworld main.o --emit-relocs objdump helloworld -fhtrS
helloworld: file format elf32-i386 architecture: i386, flags 0x00000113: HAS_RELOC, EXEC_P, HAS_SYMS, D_PAGED start address 0x08048080 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000025 08048074 08048074 00000074 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .eh_frame 00000058 0804809c 0804809c 0000009c 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA 2 .data 00000014 080490f4 080490f4 000000f4 2**2 CONTENTS, ALLOC, LOAD, DATA 3 .bss 00000004 08049108 08049108 00000108 2**2 ALLOC 4 .comment 00000011 00000000 00000000 00000108 2**0 CONTENTS, READONLY SYMBOL TABLE: 08048074 l d .text 00000000 .text 0804809c l d .eh_frame 00000000 .eh_frame 080490f4 l d .data 00000000 .data 08049108 l d .bss 00000000 .bss 00000000 l d .comment 00000000 .comment 00000000 l df *ABS* 00000000 main.c 00000000 l df *ABS* 00000000 080490f4 g O .data 00000014 lData 08049108 g O .bss 00000004 lMain 08048080 g F .text 00000019 _start 08049108 g .bss 00000000 __bss_start 08048074 g F .text 0000000c main 08049108 g .data 00000000 _edata 0804910c g .bss 00000000 _end Disassembly of section .text: 08048074: 8048074: 55 push %ebp 8048075: 89 e5 mov %esp,%ebp 8048077: b8 f4 90 04 08 mov $0x80490f4,%eax 8048078: R_386_32 lData 804807c: cd 50 int $0x50 804807e: 5d pop %ebp 804807f: c3 ret 08048080 <_start>: 8048080: 55 push %ebp 8048081: 89 e5 mov %esp,%ebp 8048083: 83 ec 08 sub $0x8,%esp 8048086: c7 05 08 91 04 08 74 movl $0x8048074,0x8049108 804808d: 80 04 08 8048088: R_386_32 lMain 804808c: R_386_32 main 8048090: a1 08 91 04 08 mov 0x8049108,%eax 8048091: R_386_32 lMain 8048095: ff d0 call *%eax 8048097: c9 leave 8048098: c3 ret
Ok, so the executable has a start address, and has relocation entries, but it has still been linked to an arbitrary address. This may work, but I'd have to investigate further to see what effect linking to an address has had, and whether the relocations contain enough information to "undo" that linking and successfully load the executable to a different address.
Conclusion
In the absence of any other mysterious options to coerce ld to do what I want, it seems that both of these options can potentially resolve the issue, but both feel hacky in their implementation. I either have find the start address by a symbol rather than the "start address" field, or I have to undo some of what the linker has done in order to load the file.
I'll probably implement one of these for the time being and modify the ELF loader to account for it, all the while wondering if I'm missing something obvious.
No comments:
Post a Comment