Hi Kyle, Thanks for the response. Further comments/queries...
> Message: 42 > Date: Wed, 16 Apr 2014 06:53:41 -0700 > From: Kyle Bader <kyle.ba...@gmail.com> > Cc: ceph-users <ceph-users@lists.ceph.com> > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local > block cache > Message-ID: > <CAFMfnwpr73UFYzGWxJ7AScnhq4BCa5gZRYgRx-DLar4uS= i...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > >> Obviously the ssds could be used as journal devices, but I'm not really > >> convinced whether this is worthwhile when all nodes have 1GB of hardware > >> writeback cache (writes to journal and data areas on the same spindle have > >> time to coalesce in the cache and minimise seek time hurt). Any advice on > >> this? > > All writes need to be written to the journal before being written to > the data volume so it's going to impact your overall throughput and > cause seeking, a hardware cache will only help with the later (unless > you use btrfs). Right, good point. So back of envelope calculations for throughput scenarios based on our hardware, just saying 150MB/s r/w for the spindles and 450/350MB/s r/w for the ssds, and pretending no controller bottlenecks etc: 1 OSD node (without ssd journals, hence divide by 2): 9 * 150 / 2 = 675MB/s write throughput 1 OSD node (with ssd journals): min(9 * 150, 3 * 350) = 1050MB/s write throughput Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s So the general naive case seems like a no-brainer, we should use SSD journals. But then we don't require even 8GB/s most of the time... > >> I think the timing should work that we'll be deploying with Firefly and so > >> have Ceph cache pool tiering as an option, but I'm also evaluating Bcache > >> versus Tier to act as node-local block cache device. Does anybody have real > >> or anecdotal evidence about which approach has better performance? > > New idea that is dependent on failure behaviour of the cache tier... > > The problem with this type of configuration is it ties a VM to a > specific hypervisor, in theory it should be faster because you don't > have network latency from round trips to the cache tier, resulting in > higher iops. Large sequential workloads may achieve higher throughput > by parallelizing across many OSDs in a cache tier, whereas local flash > would be limited to single device throughput. Ah, I was ambiguous. When I said node-local I meant OSD-local. So I'm really looking at: 2-copy write-back object ssd cache-pool versus OSD write-back ssd block-cache versus 1-copy write-around object cache-pool & ssd journal > > Carve the ssds 4-ways: each with 3 partitions for journals servicing the > > backing data pool and a fourth larger partition serving a write-around cache > > tier with only 1 object copy. Thus both reads and writes hit ssd but the ssd > > capacity is not halved by replication for availability. > > > > ...The crux is how the current implementation behaves in the face of cache > > tier OSD failures? > > Cache tiers are durable by way of replication or erasure coding, OSDs > will remap degraded placement groups and backfill as appropriate. With > single replica cache pools loss of OSDs becomes a real concern, in the > case of RBD this means losing arbitrary chunk(s) of your block devices > - bad news. If you want host independence, durability and speed your > best bet is a replicated cache pool (2-3x). This is undoubtedly true for a write-back cache-tier. But in the scenario I'm suggesting, a write-around cache, that needn't be bad news - if a cache-tier OSD is lost the cache simply just got smaller and some cached objects were unceremoniously flushed. The next read on those objects should just miss and bring them into the now smaller cache. The thing I'm trying to avoid with the above is double read-caching of objects (so as to get more aggregate read cache). I assume the standard wisdom with write-back cache-tiering is that the backing data pool shouldn't bother with ssd journals? -- Cheers, ~Blairo
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com