On 6 November 2013 14:08, Andrey Korolyov <and...@xdel.ru> wrote: > > We are looking at building high density nodes for small scale 'starter' > > deployments for our customers (maybe 4 or 5 nodes). High density in this > > case could mean a 2u chassis with 2x external 45 disk JBOD containers > > attached. That's 90 3TB disks/OSDs to be managed by a single node. > That's > > about 243TB of potential usable space, and so (assuming up to 75% > fillage) > > maybe 182TB of potential data 'loss' in the event of a node failure. On > an > > uncongested, unused, 10Gbps network, my back-of-a-beer-mat calculations > say > > that would take about 45 hours to get the cluster back into an undegraded > > state - that is the requisite number of copies of all objects. > > > > For such large number of disks you should consider that the cache > amortization will not take any place even if you are using 1GB > controller(s) - only tiered cache can be an option. Also recovery will > take much more time even if you have a room for client I/O in the > calculations because raw disks have very limited IOPS capacity and > recovery will either take a much longer than such expectations at a > glance or affect regular operations. For S3/Swift it may be acceptable > but for VM images it does not.
Sure, but my argument was that you are never likely to actually let that entire recovery operation complete - you're going to replace the hardware and plug the disks back in and let them catch up by log replay/backfill. Assuming you don't ever actually expect to really lose all data on 90 disks in one go... By tiered caching, do you mean using something like flashcache or bcache?
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com