>> Obviously the ssds could be used as journal devices, but I'm not really
>> convinced whether this is worthwhile when all nodes have 1GB of hardware
>> writeback cache (writes to journal and data areas on the same spindle have
>> time to coalesce in the cache and minimise seek time hurt). Any advice on
>> this?

All writes need to be written to the journal before being written to
the data volume so it's going to impact your overall throughput and
cause seeking, a hardware cache will only help with the later (unless
you use btrfs).

>> I think the timing should work that we'll be deploying with Firefly and so
>> have Ceph cache pool tiering as an option, but I'm also evaluating Bcache
>> versus Tier to act as node-local block cache device. Does anybody have real
>> or anecdotal evidence about which approach has better performance?
> New idea that is dependent on failure behaviour of the cache tier...

The problem with this type of configuration is it ties a VM to a
specific hypervisor, in theory it should be faster because you don't
have network latency from round trips to the cache tier, resulting in
higher iops. Large sequential workloads may achieve higher throughput
by parallelizing across many OSDs in a cache tier, whereas local flash
would be limited to single device throughput.

> Carve the ssds 4-ways: each with 3 partitions for journals servicing the
> backing data pool and a fourth larger partition serving a write-around cache
> tier with only 1 object copy. Thus both reads and writes hit ssd but the ssd
> capacity is not halved by replication for availability.
>
> ...The crux is how the current implementation behaves in the face of cache
> tier OSD failures?

Cache tiers are durable by way of replication or erasure coding, OSDs
will remap degraded placement groups and backfill as appropriate. With
single replica cache pools loss of OSDs becomes a real concern, in the
case of RBD this means losing arbitrary chunk(s) of your block devices
- bad news. If you want host independence, durability and speed your
best bet is a replicated cache pool (2-3x).

-- 

Kyle
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to