Hello, AD> What use-case (s) ? Are your pools R3, EC? Mix?
My use case is storage for virtual machines (Proxmox). AD> I like to solve first for at least 9-10 nodes, but assuming that you’re using replicated size=3 pools 5 is okay. Yes, I am using replication=3 AD> Conventional wisdom is that when using NVMe SSDs to offload WAL+DB from HDDs you want one NVMe SSD to back at most 10x HDD OSDs. Do you have your 1TB NVMe SSDs dedicating 250GB to each of the 4 HDDs? Or do you have them sliced smaller? If you don’t have room on them for additional HDD OSDs that complicates the proposition. Oh, that is interesting; I thought you needed about 4% of the HDD space for DB/WAL, so we sliced the SSDs in to 250GB partitions. AD> Sometimes people use PCIe to M.2 adapters to fit in additional NVMe drives, but take care to look into PCIe bifurcation etc. when selecting a card to accept more than one M.2 NVMe SSD. Yes, we do have four very thin NVME slots available (u.3), and we are actually using an adapter for the SSD that holds the WAL/DB for the HDDs. We were going to add another SSD for any additional HDDs we might add, as we didn't want to have *all* the HDDs' WAL/DB on a single SSD because our understanding is that if that SSD fails before we are able to replace it, then we lose all the OSDs that have their DB/WAL on it. At this point in time, losing four OSDs is a reasonable risk, but if we were to add four more additional HDDs to each node, we wouldn't want to lose eight OSDs at one time. Of course, we have monitoring on the system, so we will *hopefully* know when we are about to lose one of the SSDs (via the SMART monitoring that tells how much of the lifetime is used), but of course the SSD could just fail out of the blue as well. AD> Are your NVMe drives enterprise-class? The NVMEs we are using for actual storage in the cluster are enterprise-class, but admittedly the SSDs we are using for the WAL/DB are not. That is actually what I meant when I said we didn't have room for more NVME drives, though we technically still have three "very thin" NVME slots available as well (u.3 connector types, I think?), but we are currently using with m.2 adapters for the WAL/DB SSDs. -----Original Message----- From: Anthony D'Atri <anthony.da...@gmail.com> Sent: March 23, 2025 22:57 To: Alan Murrell <a...@t-net.ca> Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Question about cluster expansion *** This is an EXTERNAL email. Please exercise caution. DO NOT open attachments or click links from unknown senders or unexpected email. *** What use-case (s) ? Are your pools R3, EC? Mix? I like to solve first for at least 9-10 nodes, but assuming that you’re using replicated size=3 pools 5 is okay. Part of the answer is what these nodes look like. Are they dedicated to Ceph? How much RAM and CPU? Conventional wisdom is that when using NVMe SSDs to offload WAL+DB from HDDs you want one NVMe SSD to back at most 10x HDD OSDs. Do you have your 1TB NVMe SSDs dedicating 250GB to each of the 4 HDDs? Or do you have them sliced smaller? If you don’t have room on them for additional HDD OSDs that complicates the proposition. Sometimes people use PCIe to M.2 adapters to fit in additional NVMe drives, but take care to look into PCIe bifurcation etc. when selecting a card to accept more than one M.2 NVMe SSD. Are your NVMe drives enterprise-class? > On Mar 23, 2025, at 10:14 PM, Alan Murrell <a...@t-net.ca> wrote: > > Hello, > > We have a 5-node cluster that each have the following drives: > > * 4 x 16TB HDD > * 4 x 2TB NVME > * 1 x 1TB NVME (for the WAL/DB for the HDDs) > > The nodes don't have any more room to add more NVMEs, but they do have room > to add four more HDDs. I know adding more HDDs are able to make the cluster > faster due to the additional IOPs. > > So my question is this: > > Is it better to: > > * Add the additional drives/IOPs by adding an additional node > * Add the additional drives by adding the the HDDs to the existing > nodes > > Or does it not really matter? I would prefer to add the drives to the > existing nodes (ultimately maxing them out) Please share what your nodes are like to inform suggestions. I’ve recently seen a cluster deployed with 8+2 EC on only 10 nodes and inadequate CPU. When things went pear-shaped it really, really wasn’t pretty. How many SAS/SATA drive bays do your nodes have for HDDs? Like most things in tech there are disagreements, but a rule of thumb is 2x vcores / threads per HDD OSD, 4-6 for NVMe OSDs. And extra for the OS, mons, mgrs, RGWs, etc. > , but just wondering if that affects performance as much as expanding by > adding additional nodes. > > Thanks! :-) > > Sent from my mobile device. Please excuse brevity and ttpos. > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io