>> Obviously the ssds could be used as journal devices, but I'm not really >> convinced whether this is worthwhile when all nodes have 1GB of hardware >> writeback cache (writes to journal and data areas on the same spindle have >> time to coalesce in the cache and minimise seek time hurt). Any advice on >> this?
All writes need to be written to the journal before being written to the data volume so it's going to impact your overall throughput and cause seeking, a hardware cache will only help with the later (unless you use btrfs). >> I think the timing should work that we'll be deploying with Firefly and so >> have Ceph cache pool tiering as an option, but I'm also evaluating Bcache >> versus Tier to act as node-local block cache device. Does anybody have real >> or anecdotal evidence about which approach has better performance? > New idea that is dependent on failure behaviour of the cache tier... The problem with this type of configuration is it ties a VM to a specific hypervisor, in theory it should be faster because you don't have network latency from round trips to the cache tier, resulting in higher iops. Large sequential workloads may achieve higher throughput by parallelizing across many OSDs in a cache tier, whereas local flash would be limited to single device throughput. > Carve the ssds 4-ways: each with 3 partitions for journals servicing the > backing data pool and a fourth larger partition serving a write-around cache > tier with only 1 object copy. Thus both reads and writes hit ssd but the ssd > capacity is not halved by replication for availability. > > ...The crux is how the current implementation behaves in the face of cache > tier OSD failures? Cache tiers are durable by way of replication or erasure coding, OSDs will remap degraded placement groups and backfill as appropriate. With single replica cache pools loss of OSDs becomes a real concern, in the case of RBD this means losing arbitrary chunk(s) of your block devices - bad news. If you want host independence, durability and speed your best bet is a replicated cache pool (2-3x). -- Kyle _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com