> 5x 7.68TB Data Center NVMe Read Intensive AG Drive U2 with carrier > 10x3.84TB Data Center NVMe Read Intensive AG Drive U2 with Carrier
> Public Network: 2x25G port as Bond0 > Cluster Network: 2x25G Port as Bond1 Did you check the read/write throughput of those NVMEs? It seems like you may be bottlenecked on the network. We deploy nodes with 4x NVME and can push around 80-90Gbit/s. On Wed, Jul 9, 2025 at 11:24 PM Pripriya <pipriya1...@gmail.com> wrote: > Hello Anthony D'Atri > < > https://urldefense.com/v3/__https://lists.ceph.io/hyperkitty/users/8185fca310134bbc9ca3fef8ca01866d/__;!!AAbsiYo4VobK!aVOqAVlNbPaVMiIHU3ct9zFpcdUQCy3Tpac0o4BsBncIQxAUVh697S0kn-h3VDy2cK11bPdEPjwdCwWjljM$ > > > Thanks for your detailed Reply. > > 1 - Yes we have proxmox nodes but like you said they don't use any shared > cluster right now and are running as single node proxmox. > Now I have a task to move them into a clustered proxmox and use shared > storage. > Deploying external ceph to avoid dependency on single node in > hyperconverged setup, also we wanted to use ceph for other services apart > from ceph so we think its better to go with external ceph. > > Sorry I think I didn't explain it correctly about my ceph cluster. > > *Total 5 Nodes: *where I want to colocate services > 3 Nodes will have MON,MGR & OSD Services colocated > 2 Node will be primary for OSD service but if needed can expand it for > other services as we are planning to go with similar specs hardware for all > the nodes. > > What do you think we should do with core and ram reservation for per osd's > and per services where we want to populate osd's upto full capacity in > longer run (24 NVME chassis)? > > Sure, I will explore Dell R7615 with 9454 or better 32 cores (AMD EPYC 9334 > 2.70GHz, 32C/64T) because of cost > > *RAM per node:* > We are going with 32GB DIMMs which will allow more capacity increase in > future (for now 32*4=128G) > > *OSD's per node:* > 5x 7.68TB Data Center NVMe Read Intensive AG Drive U2 with carrier > > OR > 10x3.84TB Data Center NVMe Read Intensive AG Drive U2 with Carrier > > Which one is better ? > > *Networking:* > Public Network: 2x25G port as Bond0 > Cluster Network: 2x25G Port as Bond1 > (Proxmox will also have 2x25G port in bond) > > On Wed, Jul 9, 2025 at 8:26 PM Alex Gorbachev <a...@iss-integration.com> > wrote: > > > Completely agreeing with what Anthony wrote, and we see very good results > > with at least 4 physical OSD nodes, managed and deployed by cephadm - you > > will have 3 MONs and MGRs "hyperconverged" in cephadm sense, and run 3x > > replication for OSD with an extra OSD host for n+1 redundancy. > > > > Proxmox just needs a network and keyring to talk to this cluster. You > can > > run deployment and automation functions from a VM in Proxmox that runs on > > local storage. > > > > -- > > Alex Gorbachev > > > https://urldefense.com/v3/__https://alextelescope.blogspot.com__;!!AAbsiYo4VobK!aVOqAVlNbPaVMiIHU3ct9zFpcdUQCy3Tpac0o4BsBncIQxAUVh697S0kn-h3VDy2cK11bPdEPjwdSNwx0qI$ > > > > > > > > On Wed, Jul 9, 2025 at 10:28 AM Anthony D'Atri <a...@dreamsnake.net> > wrote: > > > >> > >> > > >> > I am new to this thread would like to get some suggestions to build > new > >> > external ceph cluster > >> > >> Why external? Many Proxmox deployments are converged. Is this an > >> existing Proxmox cluster that currently does not use shared storage? > >> > >> > >> > which will backend for proxmox VM's > >> > > >> > I am planning to start with 5 Nodes(3 Mon & 2 OSD) > >> > >> This is not the best plan. > >> > >> If your data is not disposable you will want to maintain the default 3 > >> copies, which you cannot safely do on 2 OSD nodes. > >> > >> When deploying a very small cluster solve first for the number of nodes. > >> You need at least 3 OSD nodes, 4 has advantages. > >> > >> So in your case, go converged: OSDs on all 5 nodes, and add the > >> mon/mgr/etc ceph orch labels to all 5 so that when a node is down a > >> replacement may be spun up. > >> > >> This would also let you deploy 5 mon instances instead of 3, which is > >> advantageous in that you can ride out 2 failures without disruption. > >> > >> > and I am expecting to start with ~60+ TB usable space. > >> > >> That would mean (3 * 60) / .85 =211.765 ~ 212 TB of raw capacity, let’s > >> see how that matches your numbers below. > >> > >> > estimated Storage Specs Calculator: > >> > > >> > RAM: 8GB/OSD Daemon, 16GB OS, 4GB for Mon & MGR, 16GB for MDS > >> > >> I would allot more than 4GB for mon/mgr. > >> > >> > cpu: 2 core/osd, 2 core for os, 2 core per services > >> > >> Cores or hyperthreads? Either way these numbers are low. > >> > >> > *Dell R7625 5 Node to start with * > >> > >> Dramatic overkill for a mon/mgr/MDS node. > >> > >> > - RAM: 128G (Plan to increase later as needed) > >> > >> I suggest 32GB DIMMs to maximize potential for future expansion. > >> > >> > - CPU: 2x AMD EPYC 9224 2.50GHz, 24C/48T, 64M Cache (200W) DDR5-4800 > >> > >> 96 threads total per server. > >> > >> > - Chassis Configuration 24x2.5 NVME > >> > >> You’ll be tempted to fill those slots; each OSD past, say, 12 will > >> decrease performance due to having to share the vcores/threads. > >> With the above CPU choice I would go with the R7615 to save rack space, > >> or bump up the CPU. The 9224 is the default choice on Dell’s > configurator > >> but there are lots of others available. The 9454 for example would give > you > >> enough cores to more comfortably service an eventual 24 OSDs. > >> > >> Alternately consider the R7615 with, say, the 9654P. The P CPUs can’t be > >> used in a dual-socket motherboard, so they’re usually a bit cheaper for > the > >> same specs. > >> > >> With EPYC CPUs you can get better performance by disabling IOMMU on the > >> kernel command line via GRUB defaults. > >> > >> > >> > - 2x1.92TB Data Center NVMe Read Intensive AG Drive U2 Gen4 with > >> carrier ( > >> > OS Disk, I need extra space) > >> > >> Okay so that will limit you to 22 OSDs with the 24-bay chassis. You > >> could provision BOSS-N1 for M.2 boot though. > >> > >> > - 5x 7.68TB Data Center NVMe Read Intensive AG Drive U2 Gen4 with > >> Carrier > >> > 24Gbps 512e 2.5in Hot-Plug 1DWPD , AG Drive > >> > >> I think you have a copy/paste error there. The second line above sounds > >> like a SAS SSD. > >> > >> So from what you wrote about this would intend a total of 10x 7.68TB OSD > >> drives. With 3x replication and the default headroom ratios these will > >> give you about 22 TB of usable space, which is just 20 TiB. > >> > >> > - 2x Nvidia ConnectX-6 Lx Dual Port 10/25GbE SFP28, No Crypto, PCIe > Low > >> > Profile > >> > >> I suggest bonding them and not having an optional replication network. > >> Some people will use one port for public and the other for replication, > but > >> for multiple reasons that wouldn’t be ideal. > >> > >> > > >> > - 1G for IPMI > >> > > >> > Please help me finalize these specs. > >> > > >> > Thanks > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@ceph.io > >> > To unsubscribe send an email to ceph-users-le...@ceph.io > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io