I haven't had the need for capacity or speed that many ceph users do, but I AM insistent on reliability, and ceph has never failed me on that point even when I've made a wreck of my hardware and/or configuration.
I don't think that it was explicitly stated, but I'm pretty sure that Ceph doesn't (at least any more) have a master command node and that essentially any ceph host can be used for control, although I do keep one host as my favorite place to login to. I know of no reason at all to have asymmetric hardware for such purposes, unless that's simply using what you have lying around or you're setting up some non-OSD nodes. so you don't need storage on them. To maybe clarify what Anthony wrote, vitually no Ceph monitors are expected to be single-instance. They're either co-operative or quorum- elected. Even access points such as NFS and RGW benefit from redundancies. Hope that helps, Tim On Sun, 2025-04-13 at 12:28 -0400, Anthony D'Atri wrote: > > > > On Apr 13, 2025, at 12:00 PM, Brendon Baumgartner > > <bren...@netcal.com> wrote: > > > > > > > > > On Apr 11, 2025, at 10:13, gagan tiwari > > > <gagan.tiw...@mathisys-india.com> wrote: > > > > > > Hi Anthony, > > > We will be using Samsung SSD 870 QVO 8TB > > > disks on > > > all OSD servers. > > > > I’m a newbie to ceph and I have a 4 node cluster and it doesn’t > > have a lot of users so downtime is easily scheduled for tinkering. > > I started with consumer SSDs (SATA/NVMEs) because they were free > > and lying around. Performance was bad. Then just the NVMEs, still > > bad. Then enterprise SSDs, still bad (relative to DAS anyway). > > Real enteprise SSDs? Enterprise NVMe not enterprise SATA? Sellers > can lie sometimes. Also be sure to update firmware to the latest, > that can make a substantial difference. > > Other factors include: > > * Enough hosts and OSDs. Three hosts with one OSD each aren’t going > to deliver a great experience > * At least 6GB of available physmem per NVMe OSD > * How you measure - a 1K QD1 fsync workload is going to be more > demanding than a buffered 64K QD32 workload. > > > > > Each step on the journey to enterprise SSDs made things faster. The > > problem with the consumer stuff is the latency. Enterprise SSDs are > > 0-2ms. Consumer SSDs are 15-300ms. As you can see, the latency > > difference is significant. > > Some client SSDs are “DRAMless”, they don’t use ~~ 1GB of onboard RAM > per 1TB of capacity as the LBA indirection table. This can be a > substantial issue for enterprise workloads. > > > > > > So from my experience, I would say ceph is very slow in general > > compared to DAS. You need all the help you can get. > > > > If you want to use the consumer stuff, I would recommend to make a > > slow tier (2nd pool with a different policy). Or I suppose just > > expect it to be slow in general. I still have my consumer drives > > installed, just configured as a 2nd tier which is unused right now > > because we have an old JBOD for 2nd tier that is much faster. > > How much drives in each? > > > > > Good luck! > > > > _BB > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io