On Jan 3, 2011, at 5:17 PM, Christopher Smith wrote: > On Mon, Jan 3, 2011 at 11:40 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > >> It's not immediately clear to me the size of the benefit versus the costs. >> Two cases where one normally thinks about direct I/O are: >> 1) The usage scenario is a cache anti-pattern. This will be true for some >> Hadoop use cases (MapReduce), not true for some others. >> - http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf >> 2) The application manages its own cache. Not applicable. >> Atom processors, which you mention below, will just exacerbate (1) due to >> the small cache size. >> > > Actually, assuming you thrash the cache anyway, having a smaller cache can > often be a good thing. ;-) >
Assuming no other thread wants to use that poor cache you are thrashing ;) > >> All-in-all, doing this specialization such that you don't hurt the general >> case is going to be tough. > > > For the Hadoop case, the advantages of O_DIRECT would seem to be > comparatively petty to using O_APPEND and/or MMAP (yes, I realize this is > not quite the same as what you are proposing, but it seems close enough for > most cases.. Your best case for a win is when you have reasonably random > access to a file, and then something else that would benefit from more logve Actually, our particular site would greatly benefit from O_DIRECT - we have non-MapReduce clients with a highly non-repetitive, random read I/O pattern with an actively managed application-level read-ahead (note: because we're almost guaranteed to wait for a disk seek - 2PB of SSDs are a touch pricey, the latency overheads of Java are not actually too important). The OS page cache is mostly useless for us as the working set size is on the order of a few hundred TB. However, I wouldn't actively clamor for O_DIRECT support, but could probably do wonders with a HDFS-equivalent to fadvise. I really don't want to get into the business of managing buffering in my application code any more than we already do. Brian PS - if there are bored folks wanting to do something beneficial to high-performance HDFS, I'd note that currently it is tough to get >1Gbps performance from a single Hadoop client transferring multiple files. However, HP labs had a clever approach: http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf . I'd love to see a generic, easy-to-use API to do this.
smime.p7s
Description: S/MIME cryptographic signature