Files are not of smaller size. Sizes are in GB and MB. Few files will be 2.5 K.
Thanks, Gagan On Mon, Apr 21, 2025 at 8:13 PM gagan tiwari < gagan.tiw...@mathisys-india.com> wrote: > Sorry that was typo I meant 4T SSD not 6T > > On Mon, 21 Apr, 2025, 5:18 pm Anthony D'Atri, <anthony.da...@gmail.com> > wrote: > >> >> >> On Apr 21, 2025, at 6:54 AM, gagan tiwari < >> gagan.tiw...@mathisys-india.com> wrote: >> >> HI Anthony, >> Based on your inputs and further digging into >> Ceph documentation, I am now thinking to go for 6 OSD nodes to have k=4 >> and m=2 EC set-up. >> >> >> Be aware that with that architecture when you lose one drive, the >> cluster’s capacity will decrease by that drive’s capacity until it is >> restored. >> >> As I mentioned., we need maximum usable space and we are more concerned >> about data safety and best read performance from the cluster. Writes >> operation will be done on a separate storage solution via NFS. >> >> >> Different data sets? Almost sounds like a task for Aerospike. >> >> >> So, with each OSD node having 22X4T Enterprise SSD >> >> >> No QVOs? >> >> we will have 88X6 = 528T Raw Space. With 4X2 EC , it will hopefully >> provide us with 390T usable space. So, that will be enough for us to >> start with. >> >> >> >> 6TB sounds like mixed-use 3DWPD SSDs? If so, those are almost certainly >> overkill. You’ll be fine with read-intensive SSDs which would be 7.6TB. >> >> Remember the below when planning usable space: >> >> * Storage vendors use base-10 units (TB) while humans mostly use base-2 >> units (TiB). So 528 TB = 480 TiB >> * Ceph has nearfull, backfillfull, and full ratios. The default nearfull >> ratio is 85%, so you will get a warning state at roughly 408TiB stored, >> OSDs will no longer accept backfill at roughly 432TiB stored, and will no >> longer accept writes at 456TiB stored. >> * With CephFS files smaller than, say, 128KB will currently waste a >> noticeable fraction of raw capacity. How large are your files? >> >> >> >> So, I need to know what will be data safely level with the above set-up ( >> i.e. 6 OSDs with 4X2 EC ). How many OSDs ( disks ) and nodes failure , >> above set-up can withstand. >> >> >> With the above topology, you can sustain one OSD failure at a time >> without losing data availability. You can sustain two overlapping OSD >> failures without losing data, but it will become unavailable until >> replication is restored. >> >> You can sustain one node being down and data will still be available. >> You can sustain two nodes being down without data loss. >> >> >> Also, if , later, we need to add more OSD modes to get more usable >> space, will we need to add same size disks ( 4T ) or can we add nodes >> with bigger size disks ( 8T or 15T ) ? >> >> >> Above you wrote 6T but here you write 4T, which is it? Note that a >> read-intensive enterprise SSD will be 3.84 TB which means 3.5 TiB. >> >> You can mix OSD drive sizes, but be aware that with a 4,2 EC profile for >> your bulk data you will absolutely want to add them evenly across nodes. >> You will want every node to have the same total capacity, otherwise some >> capacity may not be usable, because every node will need to place one shard >> of that bulk EC data. >> >> ceph config set global mon_max_pg_per_osd 1000 >> >> ^ this will help avoid certain problem scenarios when mixing drive >> capacities. >> >> Beside OSDs server , going to have three Dell servers with 8 Core and >> 64G RAM to run 3 monitor daemons one on each server. >> >> >> Ok. Better yet would be also run 2 mons on the OSD servers as well. >> >> >> One 4 core and 64G RAM with high core freq ( 4800 MHz ) server to run MDS >> daemon. >> >> Please advise >> >> >> Thanks, >> Gagan >> >> >> >> >> >> >> >> >> >> >> On Tue, Apr 15, 2025 at 8:14 PM Anthony D'Atri <anthony.da...@gmail.com> >> wrote: >> >>> It’s a function of your use-case. >>> >>> >>> > On Apr 14, 2025, at 8:41 AM, Anthony Fecarotta <anth...@linehaul.ai> >>> wrote: >>> > >>> >> MDS (if you’re going to CephFS vs using S3 object storage or RBD >>> block) >>> > Hi Anthony, >>> > >>> > Can you elaborate on this remark? >>> > >>> > Should one choose between using CephFS vs S3 Storage (as it pertains >>> to best practices)? >>> > >>> > On Proxmox, I am. using both CephFS and RBD. >>> > >>> > >>> > Regards, >>> > [image] >>> > Anthony Fecarotta >>> > Founder & President >>> > [image] anth...@linehaul.ai <mailto:anth...@linehaul.ai> >>> > [image] 224-339-1182 [image] (855) 625-0300 >>> > [image] 1 Mid America Plz Flr 3 Oakbrook Terrace, IL 60181 >>> <https://www.google.com/maps/search/1+Mid+America+Plz+Flr+3+Oakbrook+Terrace,+IL+60181?entry=gmail&source=g> >>> > [image] www.linehaul.ai <http://www.linehaul.ai/> >>> > [image] <http://www.linehaul.ai/> >>> > [image] <https://www.linkedin.com/in/anthony-fec/> >>> > >>> > On Sun Apr 13, 2025, 04:28 PM GMT, Anthony D'Atri <mailto: >>> anthony.da...@gmail.com> wrote: >>> >> >>> >>> On Apr 13, 2025, at 12:00 PM, Brendon Baumgartner < >>> bren...@netcal.com> wrote: >>> >>> >>> >>> >>> >>>> On Apr 11, 2025, at 10:13, gagan tiwari < >>> gagan.tiw...@mathisys-india.com> wrote: >>> >>>> >>> >>>> Hi Anthony, >>> >>>> We will be using Samsung SSD 870 QVO 8TB disks on >>> >>>> all OSD servers. >>> >>> >>> >>> I’m a newbie to ceph and I have a 4 node cluster and it doesn’t have >>> a lot of users so downtime is easily scheduled for tinkering. I started >>> with consumer SSDs (SATA/NVMEs) because they were free and lying around. >>> Performance was bad. Then just the NVMEs, still bad. Then enterprise SSDs, >>> still bad (relative to DAS anyway). >>> >> >>> >> Real enteprise SSDs? Enterprise NVMe not enterprise SATA? Sellers can >>> lie sometimes. Also be sure to update firmware to the latest, that can make >>> a substantial difference. >>> >> >>> >> Other factors include: >>> >> >>> >> * Enough hosts and OSDs. Three hosts with one OSD each aren’t going >>> to deliver a great experience >>> >> * At least 6GB of available physmem per NVMe OSD >>> >> * How you measure - a 1K QD1 fsync workload is going to be more >>> demanding than a buffered 64K QD32 workload. >>> >>> >>> >>> Each step on the journey to enterprise SSDs made things faster. The >>> problem with the consumer stuff is the latency. Enterprise SSDs are 0-2ms. >>> Consumer SSDs are 15-300ms. As you can see, the latency difference is >>> significant. >>> >> >>> >> Some client SSDs are “DRAMless”, they don’t use ~~ 1GB of onboard RAM >>> per 1TB of capacity as the LBA indirection table. This can be a substantial >>> issue for enterprise workloads. >>> >> >>> >>> >>> >>> So from my experience, I would say ceph is very slow in general >>> compared to DAS. You need all the help you can get. >>> >>> >>> >>> If you want to use the consumer stuff, I would recommend to make a >>> slow tier (2nd pool with a different policy). Or I suppose just expect it >>> to be slow in general. I still have my consumer drives installed, just >>> configured as a 2nd tier which is unused right now because we have an old >>> JBOD for 2nd tier that is much faster. >>> >> >>> >> How much drives in each? >>> >>> >>> >>> Good luck! >>> >>> >>> >>> _BB >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> ceph-users mailing list -- ceph-users@ceph.io >>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >> _______________________________________________ >>> >> ceph-users mailing list -- ceph-users@ceph.io >>> >> To unsubscribe send an email to ceph-users-le...@ceph.io >>> > _______________________________________________ >>> > ceph-users mailing list -- ceph-users@ceph.io >>> > To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> >> _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io