2014-03-29

2014-03-29 Library Memory Mapping

I haven't posted anything for a while, there's a big project on at work which is taking up a lot of my time.

I've been thinking about libraries and how they will work, mainly in how the memory management will work in such a way that it will work for the library and the whatever the program is doing.

Having considered and investigated various approaches, I think I've come up with the best solution.

At the point the program starts executing, the kernel has set up an address space, and loaded the program text (the code) at address 0, followed by the data and bss (uninitialised data) as set by the executable file.  The stack of the program will have been set at some arbitrarily high address, such as the top of the address space.  The kernel will have set and be keeping track of the "stack break" value, which is the top of the heap.  This state is shown in A below.



Note that because this program was written in C, it has the appropriate C library has been linked into the program executable, and includes the malloc() function and memory management functions.  It's also possible that this program may have been written purely in assembly in which case it won't have a C library and will have some other form of memory management.

The first time that the program calls the malloc() function (or during the C library initialisation, depending on the library), the memory allocator in the program's C library will make a call to the kernel function sbrk() to move the break value up by a certain amount, maybe a few megabytes, as shown in B.  The memory allocator in the program will then use this space for any allocations needed by the program.  When this space has all been allocated, the memory allocator makes another call to sbrk() to get another part of the address space.

Imagine that the program now wants to load a maths library.  It calls the kernel function openLibrary(), requesting the library.  The kernel locates the library, and moves the break value of the process up again to secure an area of the address space for the new library, and loads it.  This is shown in C.

Note again that this library has also been written in C and linked to its C library, but this C library may be different to the one in the program in such a way as to be incompatible.  This would prevent the maths library from using the malloc() implementation that is in the C library of the program.

If the newly-loaded Maths library now wants to allocate some memory for itself, it will make a separate call to the kernel sbrk() function.  This returns another area of the address space, which the memory allocator in the Library then uses for allocations.  This is shown in D.

So, if the program now more space to use for more allocations, it can make another call to sbrk() which will return another slice of the address space pie.  This new space won't be contiguous with the previously allocated program memory space, but that's OK because the memory allocator in the program's C library won't worry about that.

All right, I think that's enough for now, I think it all makes sense and should work for any combination of programs and libraries, regardless of whether they're using the standard C library, a non-standard C library, or just some custom assembly for memory management, as long as they all behave and use sbrk() and openLibrary() correctly.  I know I haven't mentioned mmap() but it should follow the same rules.


No comments:

Post a Comment