On Tuesday January 9, [EMAIL PROTECTED] wrote: > > Actually, ext3 doesn't work that way. The atime update will go into the > "running transaction", which is an instance of journal_t which is separate > from the committing transaction.
Hmm... fair enough. start_this_handle (which is called eventually from ext3_dirty_inode) seems to wait for a few different things, and I jumped to some conclusions. > > But there are situations (ie; journal free-space exhaustion) where things > can go synchronous. They're more likely to occur during metadata storms > though, and perhaps indicate an undersized journal. > > But yeah, overall point agreed with. Thanks. > > So this patch removes the minimum of '5' and introduces a new tunable > > 'vm.dirty_kb' which sets an upper limit in Kibibytes. > > kibibytes? We're feeding the kernel catfood now? :-) > > and journal commits). The symptoms are: > > While generating constant write traffic on a machine with > 20Gig > > of RAM, performing assorted read-only operations can sometimes > > produces a pause of 10s of seconds. > > The pause can be removed by: > > - mounting noatime > > - mounting data=writeback > > - setting vm.dirty_kb to 1000 with this patch. > > Could be IO scheduler borkage, could be ext3 borkage. A well-timed sysrq-T > will tell us, and is worth doing (please). > > Does increasing the journal size help? No, that was tried. > > It would be better if we can avoid creating the second global variable. Is > it not possible to remove dirty_ratio? Make everything work off > vm_dirty_kb and do arithmetricks at the /proc/sys/vm/dirty_ratio interface? Uhmmm... not sure what you are thinking. I guess we could teach vm.dirty_ratio to take a floating point number (but does sysctl understand that?) so we could set it to 0.01 or similar, but that is missing the point in a way. We don't really want to set a small ratio. We want to set a small maximum number. It could make lots of sense to have two numbers. A ratio that wins on a small memory machine and a fixed number that wins on a large memory machine. Different trade offs are more significant in the different cases. > > We should perform the same conversion to dirty_background_ratio, I suspect. > I didn't add a fixed limit for dirty_background_ratio as it seemed reasonable to assume that (dirty_background_ratio / dirty_ratio) was a meaningful value, and just multiplied the final 'dirty' figure by this ration to get the 'background' figure. > And these guys should be `long', not `int'. Otherwise things will go > pearshaped at 2 tabbybytes. I don't think so. You would need to have blindingly fast storage before there would be any interest in vm_dirty_kb getting anything close to t*bytes. But I guess we can make it 'unsigned long' if it helps. Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/