Hello
Here my results
In this node, I have 3 OSDs (1TB HDD), osd.1 and osd.2 have blocks.db in
SSD partitions each of 90GB, osd.8 has no separate blocks.db
pve-hs-main[0]:~$ for i in {1,2,8} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.1 db per object: 20872
osd.2 db per object: 20416
osd.8 db per object: 16888
In this node, I have 3 OSDs (1TB HDD), each with a 60GB blocks.db on a
separate SSD
pve-hs-2[0]:/$ for i in {3..5} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.3 db per object: 19053
osd.4 db per object: 18742
osd.5 db per object: 14979
In this node I have 3 OSDs (1TB HDD) with no separate SSD
pve-hs-3[0]:~$ for i in {0,6,7} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.0 db per object: 27392
osd.6 db per object: 54065
osd.7 db per object: 69986
My ceph df and rados df, if they can be useful
pve-hs-3[0]:~$ ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
8742G 6628G 2114G 24.19 187k
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED
MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
cephbackup 9 N/A N/A 469G 7.38
2945G 120794 117k 759k 2899k 938G
cephwin 13 N/A N/A 73788M 1.21
1963G 18711 18711 1337k 1637k 216G
cephnix 14 N/A N/A 201G 3.31
1963G 52407 52407 791k 1781k 605G
pve-hs-3[0]:~$ rados df detail
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED
RD_OPS RD WR_OPS WR
cephbackup 469G 120794 0 241588 0 0 0
777872 7286M 2968926 718G
cephnix 201G 52407 0 157221 0 0 0
810317 67057M 1824184 242G
cephwin 73788M 18711 0 56133 0 0 0
1369792 155G 1677060 136G
total_objects 191912
total_used 2114G
total_avail 6628G
total_space 8742G
Can someone see a pattern?
Il 17/10/2017 08:54, Wido den Hollander ha scritto:
Op 16 oktober 2017 om 18:14 schreef Richard Hesketh
<richard.hesk...@rd.bbc.co.uk>:
On 16/10/17 13:45, Wido den Hollander wrote:
Op 26 september 2017 om 16:39 schreef Mark Nelson <mnel...@redhat.com>:
On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
thanks David,
that's confirming what I was assuming. To bad that there is no
estimate/method to calculate the db partition size.
It's possible that we might be able to get ranges for certain kinds of
scenarios. Maybe if you do lots of small random writes on RBD, you can
expect a typical metadata size of X per object. Or maybe if you do lots
of large sequential object writes in RGW, it's more like Y. I think
it's probably going to be tough to make it accurate for everyone though.
So I did a quick test. I wrote 75.000 objects to a BlueStore device:
root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluestore.bluestore_onodes'
75085
root@alpha:~#
I then saw the RocksDB database was 450MB in size:
root@alpha:~# ceph daemon osd.0 perf dump|jq '.bluefs.db_used_bytes'
459276288
root@alpha:~#
459276288 / 75085 = 6116
So about 6kb of RocksDB data per object.
Let's say I want to store 1M objects in a single OSD I would need ~6GB of DB
space.
Is this a safe assumption? Do you think that 6kb is normal? Low? High?
There aren't many of these numbers out there for BlueStore right now so I'm
trying to gather some numbers.
Wido
If I check for the same stats on OSDs in my production cluster I see similar
but variable values:
root@vm-ds-01:~/ceph-conf# for i in {0..9} ; do echo -n "osd.$i db per object:
" ; expr `ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon
osd.$i perf dump | jq '.bluestore.bluestore_onodes'` ; done
osd.0 db per object: 7490
osd.1 db per object: 7523
osd.2 db per object: 7378
osd.3 db per object: 7447
osd.4 db per object: 7233
osd.5 db per object: 7393
osd.6 db per object: 7074
osd.7 db per object: 7967
osd.8 db per object: 7253
osd.9 db per object: 7680
root@vm-ds-02:~# for i in {10..19} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.10 db per object: 5168
osd.11 db per object: 5291
osd.12 db per object: 5476
osd.13 db per object: 4978
osd.14 db per object: 5252
osd.15 db per object: 5461
osd.16 db per object: 5135
osd.17 db per object: 5126
osd.18 db per object: 9336
osd.19 db per object: 4986
root@vm-ds-03:~# for i in {20..29} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.20 db per object: 5115
osd.21 db per object: 4844
osd.22 db per object: 5063
osd.23 db per object: 5486
osd.24 db per object: 5228
osd.25 db per object: 4966
osd.26 db per object: 5047
osd.27 db per object: 5021
osd.28 db per object: 5321
osd.29 db per object: 5150
root@vm-ds-04:~# for i in {30..39} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.30 db per object: 6658
osd.31 db per object: 6445
osd.32 db per object: 6259
osd.33 db per object: 6691
osd.34 db per object: 6513
osd.35 db per object: 6628
osd.36 db per object: 6779
osd.37 db per object: 6819
osd.38 db per object: 6677
osd.39 db per object: 6689
root@vm-ds-05:~# for i in {40..49} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.40 db per object: 5335
osd.41 db per object: 5203
osd.42 db per object: 5552
osd.43 db per object: 5188
osd.44 db per object: 5218
osd.45 db per object: 5157
osd.46 db per object: 4956
osd.47 db per object: 5370
osd.48 db per object: 5117
osd.49 db per object: 5313
I'm not sure why so much variance (these nodes are basically identical) and I
think that the db_used_bytes includes the WAL at least in my case, as I don't
have a separate WAL device. I'm not sure how big the WAL is relative to
metadata and hence how much this might be thrown off, but ~6kb/object seems
like a reasonable value to take for back-of-envelope calculating.
Yes, judging from your numbers 6kb/object seems reasonable. More datapoints are
welcome in this case.
Some input from a BlueStore dev might be helpful as well to see we are not
drawing the wrong conclusions here.
Wido
[bonus hilarity]
On my all-in-one-SSD OSDs, because bluestore reports them entirely as db space,
I get results like:
root@vm-hv-01:~# for i in {60..65} ; do echo -n "osd.$i db per object: " ; expr
`ceph daemon osd.$i perf dump | jq '.bluefs.db_used_bytes'` / `ceph daemon osd.$i perf
dump | jq '.bluestore.bluestore_onodes'` ; done
osd.60 db per object: 80273
osd.61 db per object: 68859
osd.62 db per object: 45560
osd.63 db per object: 38209
osd.64 db per object: 48258
osd.65 db per object: 50525
Rich
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
*Marco Baldini*
*H.S. Amiata Srl*
Ufficio: 0577-779396
Cellulare: 335-8765169
WEB: www.hsamiata.it <https://www.hsamiata.it>
EMAIL: mbald...@hsamiata.it <mailto:mbald...@hsamiata.it>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com