On May 20, 2012, at 5:13 AM, Bakul Shah <ba...@bitblocks.com> wrote:

> 
> How often would you flush to disk? You still need to worry about the order
> of writing metadata.
> 

that's the nice thing. it's so simple I don't have to worry about order. you 
write new
blocks and, once all of them reached the disk without errors, you write a new 
super with
the address of a new root. if you fail before the super is written, you have 
the old tree,
otherwise, you get the new one. the super has two copies of its data, in case 
you fail
in the middle of writing it, if they don't match, you use the oldest one.

The tmr proc writes new versions of the tree on a periodic basis and also if the
number of dirty (of memory addressed, actually) blocks exceeds some value.


> You do have to keep track of free disk blocks. On disk. So a linked list
> would require you visit every freed block.
> 


There's no linked list either :)
There's a mark per block, an epoch number, so you have one block with marks
for each block group. all blocks given addresses on disk use the current value 
for the
mark or epoch. when you want to collect more free blocks, you traverse the tree 
and update
the mark for blocks. After that, you increment the value for the mark that is 
used to recognize
free blocks.

Of course you could fail at any time, and, again, if you do that before writing 
the new
super the only thing that happens is that you'll have to mark again the blocks 
(you
prune the mark process when you find that the block visited already has the 
correct mark
value).

This was actually the code I had in place to double check that the previous 
implementation
for free block management was correct, but I noticed that it was both doing the 
job, and 
not disturbing normal operation in the file system, so I removed the management 
code and
declared this debug check the actual algorithm.


> I think an incore FS the easy case but you will have to face
> the issues of corner/boundary cases, various error conditions
> and efficiency when dealing with real disks. These things are
> what introduce complexity and bugs. "Soft updates" in FFS took
> quite a while shake out bugs.  zfs took a long time.  Hammer
> fs of DragonFly took a while. Pretty much every FS design has
> taken a while to be rock solid.  Far longer than the original
> estimates of the designers I think.
> 


Yes, that's why I improved it mostly by removing features and simplifying 
algorithms
instead of using clever ones. It was not easy to do that, it had like three or 
four rewrites.
Funny it was a lot easier to write the first, much more complex, version of it.

There are no soft
updates now, because the log makes that unneeded. And the complex part of
log management, which is reclaiming unused blocks, is also gone because of the
naive, yet effective, algorithm used now.

But you are right. I spent most of the time fixing problems with disks that 
were full or
had faults injected at the worst times. The operation in non boundary cases was 
always
fine, even with the fancier implementation I used before.

Regarding memory and processors, the code is aggressive and tries to use 
everything
because the machine will be devoted just to serve files.

Reply via email to