Primary RGW usage. 270M objects, 857TB data/1195TB raw, EC 8+3 in the RGW data pool, less than 200K objects in all other pools. OSDs 366 and 367 are NVMe OSDs, the rest are 10TB disks for data/DB and 2GB WAL NVMe partition. The only things on the NVMe OSDs are the RGW metadata pools. I only have 2 servers with bluestore, the rest are currently filestore in the cluster.
osd.319 onodes=164010 db_used_bytes=14433648640 avg_obj_size=23392454 overhead_per_obj=88004 osd.352 onodes=162395 db_used_bytes=12957253632 avg_obj_size=23440441 overhead_per_obj=79788 osd.357 onodes=159920 db_used_bytes=14039384064 avg_obj_size=24208736 overhead_per_obj=87790 osd.356 onodes=164420 db_used_bytes=13006536704 avg_obj_size=23155304 overhead_per_obj=79105 osd.355 onodes=164086 db_used_bytes=13021216768 avg_obj_size=23448898 overhead_per_obj=79356 osd.354 onodes=164665 db_used_bytes=13026459648 avg_obj_size=23357786 overhead_per_obj=79108 osd.353 onodes=164575 db_used_bytes=14099152896 avg_obj_size=23377114 overhead_per_obj=85670 osd.359 onodes=163922 db_used_bytes=13991149568 avg_obj_size=23397323 overhead_per_obj=85352 osd.358 onodes=164805 db_used_bytes=12706643968 avg_obj_size=23160121 overhead_per_obj=77101 osd.364 onodes=163009 db_used_bytes=14926479360 avg_obj_size=23552838 overhead_per_obj=91568 osd.365 onodes=163639 db_used_bytes=13615759360 avg_obj_size=23541130 overhead_per_obj=83206 osd.362 onodes=164505 db_used_bytes=13152288768 avg_obj_size=23324698 overhead_per_obj=79950 osd.363 onodes=164395 db_used_bytes=13104054272 avg_obj_size=23157437 overhead_per_obj=79710 osd.360 onodes=163484 db_used_bytes=14292090880 avg_obj_size=23347543 overhead_per_obj=87421 osd.361 onodes=164140 db_used_bytes=12977176576 avg_obj_size=23498778 overhead_per_obj=79061 osd.366 onodes=1516 db_used_bytes=7509901312 avg_obj_size=5743370 overhead_per_obj=4953760 osd.367 onodes=1435 db_used_bytes=7992246272 avg_obj_size=6419719 overhead_per_obj=5569509 On Tue, May 1, 2018 at 1:57 AM Wido den Hollander <w...@42on.com> wrote: > > > On 04/30/2018 10:25 PM, Gregory Farnum wrote: > > > > > > On Thu, Apr 26, 2018 at 11:36 AM Wido den Hollander <w...@42on.com > > <mailto:w...@42on.com>> wrote: > > > > Hi, > > > > I've been investigating the per object overhead for BlueStore as I've > > seen this has become a topic for a lot of people who want to store a > lot > > of small objects in Ceph using BlueStore. > > > > I've writting a piece of Python code which can be run on a server > > running OSDs and will print the overhead. > > > > https://gist.github.com/wido/b1328dd45aae07c45cb8075a24de9f1f > > > > Feedback on this script is welcome, but also the output of what > people > > are observing. > > > > The results from my tests are below, but what I see is that the > overhead > > seems to range from 10kB to 30kB per object. > > > > On RBD-only clusters the overhead seems to be around 11kB, but on > > clusters with a RGW workload the overhead goes higher to 20kB. > > > > > > This change seems implausible as RGW always writes full objects, whereas > > RBD will frequently write pieces of them and do overwrites. > > I'm not sure what all knobs are available and which diagnostics > > BlueStore exports, but is it possible you're looking at the total > > RocksDB data store rather than the per-object overhead? The distinction > > here being that the RocksDB instance will also store "client" (ie, RGW) > > omap data and xattrs, in addition to the actual BlueStore onodes. > > Yes, that is possible. But in the end, the amount of onodes is the > objects you store and then you want to know how many bytes the RocksDB > database uses. > > I do agree that RGW doesn't do partial writes and has more metadata, but > eventually that all has to be stored. > > We just need to come up with some good numbers on how to size the DB. > > Currently I assume a 10GB:1TB ratio and that is working out, but with > people wanting to use 12TB disks we need to drill those numbers down > even more. Otherwise you will need a lot of SSD space to store the DB in > SSD if you want to. > > Wido > > > -Greg > > > > > > > > I know that partial overwrites and appends contribute to higher > overhead > > on objects and I'm trying to investigate this and share my > information > > with the community. > > > > I have two use-cases who want to store >2 billion objects with a avg > > object size of 50kB (8 - 80kB) and the RocksDB overhead is likely to > > become a big problem. > > > > Anybody willing to share the overhead they are seeing with what > > use-case? > > > > The more data we have on this the better we can estimate how DBs > need to > > be sized for BlueStore deployments. > > > > Wido > > > > # Cluster #1 > > osd.25 onodes=178572 db_used_bytes=2188378112 <(218)%20837-8112> > <tel:(218)%20837-8112> > > avg_obj_size=6196529 > > overhead=12254 > > osd.20 onodes=209871 db_used_bytes=2307915776 avg_obj_size=5452002 > > overhead=10996 > > osd.10 onodes=195502 db_used_bytes=2395996160 <(239)%20599-6160> > <tel:(239)%20599-6160> > > avg_obj_size=6013645 > > overhead=12255 > > osd.30 onodes=186172 db_used_bytes=2393899008 <(239)%20389-9008> > <tel:(239)%20389-9008> > > avg_obj_size=6359453 > > overhead=12858 > > osd.1 onodes=169911 db_used_bytes=1799356416 avg_obj_size=4890883 > > overhead=10589 > > osd.0 onodes=199658 db_used_bytes=2028994560 <(202)%20899-4560> > <tel:(202)%20899-4560> > > avg_obj_size=4835928 > > overhead=10162 > > osd.15 onodes=204015 db_used_bytes=2384461824 avg_obj_size=5722715 > > overhead=11687 > > > > # Cluster #2 > > osd.1 onodes=221735 db_used_bytes=2773483520 avg_obj_size=5742992 > > overhead_per_obj=12508 > > osd.0 onodes=196817 db_used_bytes=2651848704 avg_obj_size=6454248 > > overhead_per_obj=13473 > > osd.3 onodes=212401 db_used_bytes=2745171968 avg_obj_size=6004150 > > overhead_per_obj=12924 > > osd.2 onodes=185757 db_used_bytes=3567255552 avg_obj_size=5359974 > > overhead_per_obj=19203 > > osd.5 onodes=198822 db_used_bytes=3033530368 <(303)%20353-0368> > <tel:(303)%20353-0368> > > avg_obj_size=6765679 > > overhead_per_obj=15257 > > osd.4 onodes=161142 db_used_bytes=2136997888 <(213)%20699-7888> > <tel:(213)%20699-7888> > > avg_obj_size=6377323 > > overhead_per_obj=13261 > > osd.7 onodes=158951 db_used_bytes=1836056576 avg_obj_size=5247527 > > overhead_per_obj=11551 > > osd.6 onodes=178874 db_used_bytes=2542796800 <(254)%20279-6800> > <tel:(254)%20279-6800> > > avg_obj_size=6539688 > > overhead_per_obj=14215 > > osd.9 onodes=195166 db_used_bytes=2538602496 <(253)%20860-2496> > <tel:(253)%20860-2496> > > avg_obj_size=6237672 > > overhead_per_obj=13007 > > osd.8 onodes=203946 db_used_bytes=3279945728 avg_obj_size=6523555 > > overhead_per_obj=16082 > > > > # Cluster 3 > > osd.133 onodes=68558 db_used_bytes=15868100608 <(586)%20810-0608> > > <tel:(586)%20810-0608> avg_obj_size=14743206 > > overhead_per_obj=231455 > > osd.132 onodes=60164 db_used_bytes=13911457792 avg_obj_size=14539445 > > overhead_per_obj=231225 > > osd.137 onodes=62259 db_used_bytes=15597568000 <(559)%20756-8000> > > <tel:(559)%20756-8000> avg_obj_size=15138484 > > overhead_per_obj=250527 > > osd.136 onodes=70361 db_used_bytes=14540603392 avg_obj_size=13729154 > > overhead_per_obj=206657 > > osd.135 onodes=68003 db_used_bytes=12285116416 <(228)%20511-6416> > > <tel:(228)%20511-6416> avg_obj_size=12877744 > > overhead_per_obj=180655 > > osd.134 onodes=64962 db_used_bytes=14056161280 <(405)%20616-1280> > > <tel:(405)%20616-1280> avg_obj_size=15923550 > > overhead_per_obj=216375 > > osd.139 onodes=68016 db_used_bytes=20782776320 avg_obj_size=13619345 > > overhead_per_obj=305557 > > osd.138 onodes=66209 db_used_bytes=12850298880 avg_obj_size=14593418 > > overhead_per_obj=194086 > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com