Hi,

Seems more and more that ceph is out in the wild - people are using it for
production, development speeds up etc.

Still how can one pick the configuration suiting his needs?

For example, we wish to replace older SAN IBM/HP storages with ceph, we
know iops, bandwidth capabilities of those, but there is no "ceph
calculator" to get estimations how many OSDs/hosts we need to match
existing storage performance parameters.

We have done some tests internally with up to 7 OSDs(2-3 hosts), but
increasing OSD count in such small amounts does not  influence ceph
performance considerably/lineary to extrapolate till needed performance.
Have been following performance figures in maillists, Marks performance
tests and read Inktanks provided reference architecture at
http://www.inktank.com/resource/ceph-reference-architecture/ (btw, that doc
mentions other "Multi-Rack Object Storage Reference Architecture" which I
cannot find, anyone has fount it?). In the end I have come to wild guess
that starting ~24 spinning OSDs, 2-3 hosts should match our needed starting
performance. At this piont it would be helpful to have some estimation tool
or reliable reference that estimation is realistic :).

Sure ceph is dynamic creature, many parameters influence resulting
performance(SSD/spinning HDD, network, filesystems, journals, replication
count etc.) but still when people face question "can we do it with ceph?"
some metology or tools like "ceph calculator" could help to estimate needed
HW and consequently come to expected investment needed for that. Admins
need to convince management at times on certain solution, right? :)

I have been thinking for solutions to this informantion gap and I see 2
supplementing solutions:
1)create publically available ceph configurations+performance reference
lists from real life where people can add their cephs and compare. Just
standartized approach for conducting tests must be in place - to compare
apples to apples. For example people could specify their OSD host count,
OSD count, OSD filesystem, OSD server model, CPU, RAM, network,
interfaces(type, speed) ceph version, replica count and the like + provide
standardized performance test results of their ceph, like rados bench, fio
tests with 4K, 4M, random/sequential, read/write.
Other could look for matching working configuration and compare to their
clusters. This should encourage startups with real examples and for
existing ceph users look for possible tuning.

2)develop theoretical ceph calculator or formula where one can specify
needed performance characteristics(iops, bandwidth,size), specify planned
HW parameters(if available) and get estimated ceph configuration(needed
hosts,CPUs,RAM,OSDs,network). This should take into consideration HDD
count, size, smallest iops per HDD, network latency, RAM, replica count,
connection type to ceph(direct via kernel client, userland, via FC/iscsi
proxy etc.) and other influencing parameters. There will allways be palce
for advanced know-how tuning, this would just be for easy estimated
calculations to get started.

Both things seem to naturally land  to http://wiki.ceph.com/ and be hosted
there as current central ceph knowledge base, right? :)

What do you think on chances of implementing both things?

Ugis
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to