On Thu, 2006-11-23 at 17:31 -0500, Douglas Tutty wrote: > > The question is how does the file system know that a write has made it > to disk. E.g. if the file system is atomic transaction oriented, how > can the file system know that a commit has been committed if the drive > lies? >
Its hard to know for sure especially if the server is under abnormal load and the inodes are 100% in use, and all that's left is dirty paging. This seems to be where the problem happens frequently. I've been following this thread and thought I'd do a bit of experimenting to see which of the two best recover themselves. Here's my worst case scenario (and test bed) Debian Sarge under Xen, 1 40 GB lvm backed partition (jfs) (#1) Debian Sarge under Xen, 1 40 GB lvm backed partition (ext3) (#2) Both LVM backed VBD's live on separate 400 GB SATA drives. Standard O/B SATA controller (4 port, no raid). Both systems have a small 512 MB ext2 root FS as a control. The 40 GB partition was mounted in /datahell Both systems have 2 GB RAM, 2 CPU's (Test conducted on a Dual Opteron), test machine one has cpu0 core 0 cpu2 core 1, test machine 2 has cpu0 core1 cpu2 core0. So now we have for all intensive purposes 2 machines with a single dual core opteron in them. Here was the test : Untar about 12 GB worth of files on both drives.. these files consist of some old backup CD's, shareware CD's .. just thousands and thousands of files. I then ran a shell script that caused 'updatedb' to fork a few hundred times in the background on each server, it kept forking until /proc/loadavg got to be about 70.0 Once that happened, I paused both VM's, issued a sysrq to sync disks and destroyed them in memory. This simulated an out of control box where the admin was able to effect a shutdown where disks synced (not just push re-set). Booted them up again : Ext3 spent 30 minutes in a fsck, some data was lost jfs spent 5 minutes, no data was lost ext2 root FS didn't have any issues.. but nothing was being written to it during the experiment. Experiment #2 Fresh 20 GB partitions just like before : Same experiment, only this time I didn't sync disks. I just destroyed the VMs in memory (same as pulling out the power plug), rebooted. ext3 fixed a couple of inodes and came back pretty quickly jfs drive wasn't able to be mounted. Again, ext2 root fs had no issues, but we weren't expecting any. ext2 rootfs was used just as a control (and to boot). /var was moved to the second drive (where slocate's DB lives). End result is, its going to depend on how the file system manages to allocate inodes ahead of itself , and at what point in time your system runs out of clean pages to grab. JFS seems to do well *only* if your able to sync disks and it can write those inodes.. it leaves quite a bit of data in memory. However its much happier about flushing its inode cache and syncing even if all that is available is dirty paging. ext3 seems more likely to recover from its journal in the event you can't sync disks, but syncing it with maxxed/bloated inodes (reaching into dirty pages) seems to break it. Its really application specific I guess.. if you have the luxury of being able to anticipate what the world will do to your public services once you plug the Internet into a server the choice is a little easier .. but there is no magic bullet :) Ext3 seems more likely to come back to life after an unattended crash (where nobody was there to try and slow down the skid.) JFS seems like the winner if your system doesn't often get abused, or if you have the ability to monitor it closely and intercede should you see dirty paging (swap) and inodes running high. Note, because JFS seems to use much more memory to allocate its inodes, this may lead to your applications needing swap faster than they would with ext3. 6 of one , 1/2 dozen of the other really.. but hopefully my little experiment helps someone decide which one is best to use :) I had a few systems setup for an ocfs2 stress test and figured I'd take advantage of it for this. I was in no way measuring i/o performance .. just how well file systems came back to life after bad things happened. Best, -Tim > Doug. > > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

