On Wed, May 16, 2007 at 11:25:15AM -0700, Andrew Morton wrote: > On Wed, 16 May 2007 13:11:56 -0400 > Chris Mason <[EMAIL PROTECTED]> wrote: > > > At least on ext3, it may help to sort the blocks under io for > > flushing...it may not. A bigger log would definitely help, but I would > > say the mkfs defaults should be reasonable for a workload this simple. > > > > (data=writeback was used for my ext3 numbers). > > When ext3 runs out of journal space it needs to sync lots of metadata out > to the fs so that its space in the journal can be reclaimed. That metadata > is of course splattered all over the place so it's seekstorm time. > > The filesystem does take some care to place the metadata blocks "close" to > the data blocks. But of course if we're writing all the pagecache and then > we later separately go back and write the metadata then that would screw > things up.
Just to clarify, in the initial stage where kernel trees are created, benchmark doesn't call sync. So all the writeback is through the normal async mechanisms. > > I put some code in there which will place indirect blocks under I/O at > the same time as their data blocks, so everything _should_ go out in a > nice slurp (see write_boundary_block()). The first thing to do here > is to check that write_boundary_block() didn't get broken. write_boundary_block should get called from pdflush and the IO done by pdflush seems to be pretty sequential. But, in this phase the vast majority of the files are small (95% are less than 46k). > > If that's still working then the problem will _probably_ be directory > writeout. Possibly inodes, but they should be well-laid-out. > > Were you using dir_index? That might be screwing things up. Yes, dir_index. A quick test of mkfs.ext3 -O ^dir_index seems to still have the problem. Even though the inodes are well laid out, is the order they get written sane? Looks like ext3 is just walking a list of bh/jh, maybe we can just sort the silly thing? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/