2010-05-13

2010-05-13 - Debugging the Filesystem

Continuing the investigation of the corrupting filesystem.  It seems the first hurdle being failed is identifying the sector number of the start of the root directory, this is being read out as 6336 on a volume that only has 15Mb of space.  Upon further investigation, the driver believes that there ate 197 FATs on the disk and 32 sectors of reserved space.  Looking at the on-disk structure by hand reveals the expected 2 FATs and only 1 sector of reserved data.  I believe the problem is that something is writing outside it's allocated memory and trashing the FAT context object.  This could be tricky to track down.

I'm going to reimplement the ShowContext function through the DOS layer.  This will allow me to examine the context at each stage of the process and see at what point the values are being corrupted.

ShowContext() function reimplemented so that it's accessible from the front end via the dos layer ("ShowContext HD1:" ... Win!).  The values in it are definitely wrong for both HD0: and HD1: (the two hard drive partitions).  So, either it's not allocating enough space then being overwritten, or it's being overwritten by some misbehaving code, which can only be somewhere in the MOUNT command.

A modification to the ShowMem command so that it shows the exact allocated block at that location (if one exists) and a lot more poking later and it seems the problem is somewhere within the list code.  Adding the third entry to the DosDevice list seems to cause some memory corruption.  I'll tweak the mountlist first to see if I can confirm this, then look into why.

AH HA!  In the FAT context creation code, it allocates space for the FAT context and the boot sector (kept around for posterity), and it uses the BDD to determine the amount of space it needs to allocate for the block.  However, the hard drive is returning a block size of zero because that seems to be what is coming from the drive.  A mistake in the FAT code doesn't correctly account for this, causing it to allocate 0 bytes, and then write into it, which the next operation promptly overwrites.  Let's fix this.

\o/  It works properly now.  There are a few oddities, such as it only seems to use every other cluster for subsequent saves, but it seems to be behaving.

Identified a minor problem in the Dos subsystem where searching for "HD:" would find the first device starting with HD instead of specifically "HD:", but this was easily fixed, and a small modification to provide a global variable called CurrentDirectory which is used if no device name is identified in the given filename.

Moving on to look again at the modules now, it would be nice to be able to create a program that can be loaded, executed, then flushed, even if only for a demo.  This will also be useful as having the kernel itself depend on libraries will cause no end of trouble, so I can make the loaded modules (executables) do that instead.  So how about HelloWorld.exe?

Ok, so LD isn't playing ball.  Firstly it linked the two .o files into an MZ executable, then it linked the two files into a COFF file, but didn't actually tie up the references or combine the sections to produce a single COFF file.

No comments:

Post a Comment