On Mon, Jun 23, 2014 at 3:03 PM, Mark Nelson <mark.nel...@inktank.com> wrote:
> Well, for random IO you often can't do much coalescing. You have to bite > the bullet and either parallelize things or reduce per-op latency. Ceph > already handles parallelism very well. You just throw more disks at the > problem and so long as there are enough client requests it more or less > just scales (limited by things like network bisection bandwidth or other > complications). On the latency side, spinning disks aren't fast enough for > Ceph's extra latency overhead to matter much, but with SSDs the story is > different. That's why we are very interested in reducing latency. > > Regarding journals: Journal writes are always sequential (even for random > IO!), but are O_DIRECT so they'll skip linux buffer cache. If you have > hardware that is fast at writing sequential small IO (say a controller with > WB cache or an SSD), you can do journal writes very quickly. For bursts of > small random IO, performance can be quite good. The downsides is that you > can hit journal limits very quickly, meaning you have to flush and wait for > the underlying filestore to catch up. This results in performance that > starts out super fast, then stalls once the journal limits are hit, back to > super fast again for a bit, then another stall, etc. This is less than > ideal given the way crush distributes data across OSDs. The alternative is > setting a soft limit on how much data is in the journal and flushing > smaller amounts of data more quickly to limit the spikey behaviour. On the > whole, that can be good but limits the burst potential and also limits the > amount of data that could potentially be coalesced in the journal. > Mark, What settings are you suggesting for setting a soft limit on journal size and flushing smaller amounts of data? Something like this? filestore_queue_max_bytes: 10485760 filestore_queue_committing_max_bytes: 10485760 journal_max_write_bytes: 10485760 journal_queue_max_bytes: 10485760 ms_dispatch_throttle_bytes: 10485760 objecter_infilght_op_bytes: 10485760 (see "Small bytes" in http://ceph.com/community/ceph-bobtail-jbod-performance-tuning) > > Luckily with RBD you can (when applicable) coalesce on the client with RBD > cache instead, which is arguably better anyway since you can send bigger > IOs to the OSDs earlier in the write path. So long as you are ok with what > RBD cache does and does not guarantee, it's definitely worth enabling imho. > > Thanks, Jake
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com