From a pragmatic standpoint, such a file should be well-defined and readily useful. Any executable to be loaded into a flat memory model (i.e. a non-virtual memory system, such as DOS, AmigaOS, etc.) or an executable designed to be loaded into an address space with other executables (such as kernel modules, drivers, etc.) would have to be relocatable.
What do I mean by "executable"? As far as Atomicity goes at present, for the file to be executable it needs to have no unresolved symbols and has to have a defined entry point. To have no unresolved symbols, I may be able to invoke a linker flag to mandate this, but for the moment it is sufficient to just write the code correctly. To have an entry point, the "entry" field of the ELF File Header has to be populated with a meaningful value, OR the symbol table has to be present with the address of a symbol such as "_start".
What do I mean by "relocatable"? When the linker creates the final file, it doesn't know where in memory the file will be loaded, so it includes what is called a relocation table. this table contains references to locations within the executable along with information on how those memory locations need to be updated with the correct address when the file is loaded.
When I invoke ld with no special options, it appears to choose an arbitrary memory location, and links the executable as if it is to be loaded at that memory address. It includes the entry point address, but no relocation table.. I've been trying to find a combination of runes to pass to ld to get it to create something which can approximate an ELF relocatable executable.
From a quick peruse of the internet, two options keep cropping up as potentials; --relocatable and --emit-relocs. the first of these appears to provide a useful relocation table, but excludes the entry point. The second of these has a relocation table and an entry point, but the executable is still linked for an arbitrary memory address.
Let's use objdump to examine each of these in detail.
Source Code
This is our test program. It's been designed to have relocations from the TEXT (code) segment to the DATA segment and back again.
char lData[20] = "ABCDEFG";
void (*lMain)();
void main()
{
asm("int $0x50" : : "a" (lData) );
}
void _start()
{
lMain = main;
lMain();
}
Test 1 : Default LD
We compile this with gcc, and link it with ld as normal.
gcc -c main.c -o main.o ld -o helloworld main.o
Then we use objdump to examine the relevant parts of the file.
objdump helloworld -fhtrS
helloworld: file format elf32-i386
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08048080
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000025 08048074 08048074 00000074 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .eh_frame 00000058 0804809c 0804809c 0000009c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .data 00000014 080490f4 080490f4 000000f4 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 00000004 08049108 08049108 00000108 2**2
ALLOC
4 .comment 00000011 00000000 00000000 00000108 2**0
CONTENTS, READONLY
SYMBOL TABLE:
08048074 l d .text 00000000 .text
0804809c l d .eh_frame 00000000 .eh_frame
080490f4 l d .data 00000000 .data
08049108 l d .bss 00000000 .bss
00000000 l d .comment 00000000 .comment
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000
080490f4 g O .data 00000014 lData
08049108 g O .bss 00000004 lMain
08048080 g F .text 00000019 _start
08049108 g .bss 00000000 __bss_start
08048074 g F .text 0000000c main
08049108 g .data 00000000 _edata
0804910c g .bss 00000000 _end
Disassembly of section .text:
08048074 :
8048074: 55 push %ebp
8048075: 89 e5 mov %esp,%ebp
8048077: b8 f4 90 04 08 mov $0x80490f4,%eax
804807c: cd 50 int $0x50
804807e: 5d pop %ebp
804807f: c3 ret
08048080 <_start>:
8048080: 55 push %ebp
8048081: 89 e5 mov %esp,%ebp
8048083: 83 ec 08 sub $0x8,%esp
8048086: c7 05 08 91 04 08 74 movl $0x8048074,0x8049108
804808d: 80 04 08
8048090: a1 08 91 04 08 mov 0x8049108,%eax
8048095: ff d0 call *%eax
8048097: c9 leave
8048098: c3 ret
As we can see, all the symbols are all resolved, the file has a start address ... except the files is linked for a specific address (0x8048074 for the .text section) and contains no relocations, so that's no good.
Test 2 : --relocatable
Let's try the first of our options, --relocatable.
ld -o helloworld main.o --relocatable objdump helloworld -fhtrS
helloworld: file format elf32-i386
architecture: i386, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000025 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .eh_frame 00000058 00000000 00000000 0000005c 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
2 .data 00000014 00000000 00000000 000000b4 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 00000000 00000000 00000000 000000c8 2**2
ALLOC
4 .comment 00000012 00000000 00000000 000000c8 2**0
CONTENTS, READONLY
SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000000 l d .eh_frame 00000000 .eh_frame
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .comment 00000000 .comment
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000
00000000 g O .data 00000014 lData
00000004 O *COM* 00000004 lMain
0000000c g F .text 00000019 _start
00000000 g F .text 0000000c main
Disassembly of section .text:
00000000 :
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: b8 00 00 00 00 mov $0x0,%eax
4: R_386_32 lData
8: cd 50 int $0x50
a: 5d pop %ebp
b: c3 ret
0000000c <_start>:
c: 55 push %ebp
d: 89 e5 mov %esp,%ebp
f: 83 ec 08 sub $0x8,%esp
12: c7 05 00 00 00 00 00 movl $0x0,0x0
19: 00 00 00
14: R_386_32 lMain
18: R_386_32 main
1c: a1 00 00 00 00 mov 0x0,%eax
1d: R_386_32 lMain
21: ff d0 call *%eax
23: c9 leave
24: c3 ret
This "partial link" has no start address, but does contains the _start symbol with a relative offset from the beginning of the .text section, so we could use this, but it's not ideal. It hasn't been linked to an arbitrary address, and includes relocation information (displayed inline in the disassembly). Incidentally, adding an explicit "-entry=_start" to the commadn to tell ld which is the start symbol had no effect.
Test 3 : --emit-relocs
We'll try the next option.
ld -o helloworld main.o --emit-relocs objdump helloworld -fhtrS
helloworld: file format elf32-i386
architecture: i386, flags 0x00000113:
HAS_RELOC, EXEC_P, HAS_SYMS, D_PAGED
start address 0x08048080
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000025 08048074 08048074 00000074 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .eh_frame 00000058 0804809c 0804809c 0000009c 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
2 .data 00000014 080490f4 080490f4 000000f4 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 00000004 08049108 08049108 00000108 2**2
ALLOC
4 .comment 00000011 00000000 00000000 00000108 2**0
CONTENTS, READONLY
SYMBOL TABLE:
08048074 l d .text 00000000 .text
0804809c l d .eh_frame 00000000 .eh_frame
080490f4 l d .data 00000000 .data
08049108 l d .bss 00000000 .bss
00000000 l d .comment 00000000 .comment
00000000 l df *ABS* 00000000 main.c
00000000 l df *ABS* 00000000
080490f4 g O .data 00000014 lData
08049108 g O .bss 00000004 lMain
08048080 g F .text 00000019 _start
08049108 g .bss 00000000 __bss_start
08048074 g F .text 0000000c main
08049108 g .data 00000000 _edata
0804910c g .bss 00000000 _end
Disassembly of section .text:
08048074 :
8048074: 55 push %ebp
8048075: 89 e5 mov %esp,%ebp
8048077: b8 f4 90 04 08 mov $0x80490f4,%eax
8048078: R_386_32 lData
804807c: cd 50 int $0x50
804807e: 5d pop %ebp
804807f: c3 ret
08048080 <_start>:
8048080: 55 push %ebp
8048081: 89 e5 mov %esp,%ebp
8048083: 83 ec 08 sub $0x8,%esp
8048086: c7 05 08 91 04 08 74 movl $0x8048074,0x8049108
804808d: 80 04 08
8048088: R_386_32 lMain
804808c: R_386_32 main
8048090: a1 08 91 04 08 mov 0x8049108,%eax
8048091: R_386_32 lMain
8048095: ff d0 call *%eax
8048097: c9 leave
8048098: c3 ret
Ok, so the executable has a start address, and has relocation entries, but it has still been linked to an arbitrary address. This may work, but I'd have to investigate further to see what effect linking to an address has had, and whether the relocations contain enough information to "undo" that linking and successfully load the executable to a different address.
Conclusion
In the absence of any other mysterious options to coerce ld to do what I want, it seems that both of these options can potentially resolve the issue, but both feel hacky in their implementation. I either have find the start address by a symbol rather than the "start address" field, or I have to undo some of what the linker has done in order to load the file.
I'll probably implement one of these for the time being and modify the ELF loader to account for it, all the while wondering if I'm missing something obvious.
No comments:
Post a Comment