On Mon, Jun 23, 2014 at 3:03 PM, Mark Nelson <mark.nel...@inktank.com>
wrote:

> Well, for random IO you often can't do much coalescing.  You have to bite
> the bullet and either parallelize things or reduce per-op latency.  Ceph
> already handles parallelism very well.  You just throw more disks at the
> problem and so long as there are enough client requests it more or less
> just scales (limited by things like network bisection bandwidth or other
> complications).  On the latency side, spinning disks aren't fast enough for
> Ceph's extra latency overhead to matter much, but with SSDs the story is
> different.  That's why we are very interested in reducing latency.
>
> Regarding journals:  Journal writes are always sequential (even for random
> IO!), but are O_DIRECT so they'll skip linux buffer cache.  If you have
> hardware that is fast at writing sequential small IO (say a controller with
> WB cache or an SSD), you can do journal writes very quickly.  For bursts of
> small random IO, performance can be quite good.  The downsides is that you
> can hit journal limits very quickly, meaning you have to flush and wait for
> the underlying filestore to catch up. This results in performance that
> starts out super fast, then stalls once the journal limits are hit, back to
> super fast again for a bit, then another stall, etc.  This is less than
> ideal given the way crush distributes data across OSDs.  The alternative is
> setting a soft limit on how much data is in the journal and flushing
> smaller amounts of data more quickly to limit the spikey behaviour.  On the
> whole, that can be good but limits the burst potential and also limits the
> amount of data that could potentially be coalesced in the journal.
>

Mark,

What settings are you suggesting for setting a soft limit on journal size
and flushing smaller amounts of data?

Something like this?
filestore_queue_max_bytes: 10485760
filestore_queue_committing_max_bytes: 10485760
journal_max_write_bytes: 10485760
journal_queue_max_bytes: 10485760
ms_dispatch_throttle_bytes: 10485760
objecter_infilght_op_bytes: 10485760

(see "Small bytes" in
http://ceph.com/community/ceph-bobtail-jbod-performance-tuning)


>
> Luckily with RBD you can (when applicable) coalesce on the client with RBD
> cache instead, which is arguably better anyway since you can send bigger
> IOs to the OSDs earlier in the write path.  So long as you are ok with what
> RBD cache does and does not guarantee, it's definitely worth enabling imho.
>
>
Thanks,

Jake
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to