Re: [ceph-users] Bluestore WAL/DB decisions
Hi Erik, For now I have everything on the hdd's and I have some pools on just ssd's that require more speed. It looked to me the best way to start simple. I do not seem to need the iops yet to change this setup. However I am curious about what the kind of performance increase you will get from moving the db/wal to ssd with spinners. So if you are able to, please publish some test results of the same environment from before and after your change. Thanks, Marc -Original Message- From: Erik McCormick [mailto:emccorm...@cirrusseven.com] Sent: 29 March 2019 06:22 To: ceph-users Subject: [ceph-users] Bluestore WAL/DB decisions Hello all, Having dug through the documentation and reading mailing list threads until my eyes rolled back in my head, I am left with a conundrum still. Do I separate the DB / WAL or not. I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split the journals on the remaining SSD space. My initial minimal understanding of Bluestore was that one should stick the DB and WAL on an SSD, and if it filled up it would just spill back onto the OSD itself where it otherwise would have been anyway. So now I start digging and see that the minimum recommended size is 4% of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that available to me. I've also read that it's not so much the data size that matters but the number of objects and their size. Just looking at my current usage and extrapolating that to my maximum capacity, I get to ~1.44 million objects / OSD. So the question is, do I: 1) Put everything on the OSD and forget the SSDs exist. 2) Put just the WAL on the SSDs 3) Put the DB (and therefore the WAL) on SSD, ignore the size recommendations, and just give each as much space as I can. Maybe 48GB / OSD. 4) Some scenario I haven't considered. Is the penalty for a too small DB on an SSD partition so severe that it's not worth doing? Thanks, Erik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD
Hi, Am 28.03.19 um 20:03 schrieb c...@elchaka.de: > Hi Uwe, > > Am 28. Februar 2019 11:02:09 MEZ schrieb Uwe Sauter : >> Am 28.02.19 um 10:42 schrieb Matthew H: >>> Have you made any changes to your ceph.conf? If so, would you mind >> copying them into this thread? >> >> No, I just deleted an OSD, replaced HDD with SDD and created a new OSD >> (with bluestore). Once the cluster was healty again, I >> repeated with the next OSD. >> >> >> [global] >> auth client required = cephx >> auth cluster required = cephx >> auth service required = cephx >> cluster network = 169.254.42.0/24 >> fsid = 753c9bbd-74bd-4fea-8c1e-88da775c5ad4 >> keyring = /etc/pve/priv/$cluster.$name.keyring >> public network = 169.254.42.0/24 >> >> [mon] >> mon allow pool delete = true >> mon data avail crit = 5 >> mon data avail warn = 15 >> >> [osd] >> keyring = /var/lib/ceph/osd/ceph-$id/keyring >> osd journal size = 5120 >> osd pool default min size = 2 >> osd pool default size = 3 >> osd max backfills = 6 >> osd recovery max active = 12 > > I guess should decrease this last two parameters to 1. This should help to > avoid to much pressure on your drives... > Unlikely to help as no recovery / backfilling is running when the situation appears. > Hth > - Mehmet > >> >> [mon.px-golf-cluster] >> host = px-golf-cluster >> mon addr = 169.254.42.54:6789 >> >> [mon.px-hotel-cluster] >> host = px-hotel-cluster >> mon addr = 169.254.42.55:6789 >> >> [mon.px-india-cluster] >> host = px-india-cluster >> mon addr = 169.254.42.56:6789 >> >> >> >> >>> >>> >> -- >>> *From:* ceph-users on behalf of >> Vitaliy Filippov >>> *Sent:* Wednesday, February 27, 2019 4:21 PM >>> *To:* Ceph Users >>> *Subject:* Re: [ceph-users] Blocked ops after change from filestore >> on HDD to bluestore on SDD >>> >>> I think this should not lead to blocked ops in any case, even if the >>> performance is low... >>> >>> -- >>> With best regards, >>> Vitaliy Filippov >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS and many small files
Hi! In my ongoing quest to wrap my head around Ceph, I created a CephFS (data and metadata pool with replicated size 3, 128 pgs each). When I mount it on my test client, I see a usable space of ~500 GB, which I guess is okay for the raw capacity of 1.6 TiB I have in my OSDs. I run bonnie with -s 0G -n 20480:1k:1:8192 i.e. I should end up with ~20 million files, each file 1k in size maximum. After about 8 million files (about 4.7 GBytes of actual use), my cluster runs out of space. Is there something like a "block size" in CephFS? I've read http://docs.ceph.com/docs/master/cephfs/file-layouts/ and thought maybe object_size is something I can tune, but I only get $ setfattr -n ceph.dir.layout.object_size -v 524288 bonnie setfattr: bonnie: Invalid argument Is this even the right approach? Or are "CephFS" and "many small files" such opposing concepts that it is simply not worth the effort? -- Jörn Clausen Daten- und Rechenzentrum GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel Düsternbrookerweg 20 24105 Kiel smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
We have maxed out the files per dir. CEPH is trying to do an online split due to which osd's are crashing. We increased the split_multiple and merge_threshold for now and are restarting osd's. Now on these restarts the leveldb compaction is taking a long time. Below are some of the logs. 2019-03-29 06:25:37.082055 7f3c6320a8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2019-03-29 06:25:37.082064 7f3c6320a8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2019-03-29 06:25:37.082079 7f3c6320a8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice is supported 2019-03-29 06:25:37.096658 7f3c6320a8c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2019-03-29 06:25:37.096703 7f3c6320a8c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is disabled by conf 2019-03-29 06:25:37.295577 7f3c6320a8c0 1 leveldb: Recovering log #1151738 2019-03-29 06:25:37.445516 7f3c6320a8c0 1 leveldb: Delete type=0 #1151738 2019-03-29 06:25:37.445574 7f3c6320a8c0 1 leveldb: Delete type=3 #1151737 2019-03-29 07:11:50.619313 7ff6c792b700 1 leveldb: Compacting 1@3 + 12@4 files 2019-03-29 07:11:50.639795 7ff6c792b700 1 leveldb: Generated table #1029200: 7805 keys, 2141956 bytes 2019-03-29 07:11:50.649315 7ff6c792b700 1 leveldb: Generated table #1029201: 4464 keys, 1220994 bytes 2019-03-29 07:11:50.660485 7ff6c792b700 1 leveldb: Generated table #1029202: 7813 keys, 2142882 bytes 2019-03-29 07:11:50.672235 7ff6c792b700 1 leveldb: Generated table #1029203: 6283 keys, 1712810 bytes 2019-03-29 07:11:50.697949 7ff6c792b700 1 leveldb: Generated table #1029204: 7805 keys, 2142841 bytes 2019-03-29 07:11:50.714648 7ff6c792b700 1 leveldb: Generated table #1029205: 5173 keys, 1428905 bytes 2019-03-29 07:11:50.757146 7ff6c792b700 1 leveldb: Generated table #1029206: 7888 keys, 2143304 bytes 2019-03-29 07:11:50.774357 7ff6c792b700 1 leveldb: Generated table #1029207: 5168 keys, 1425634 bytes 2019-03-29 07:11:50.830276 7ff6c792b700 1 leveldb: Generated table #1029208: 7821 keys, 2146114 bytes 2019-03-29 07:11:50.849116 7ff6c792b700 1 leveldb: Generated table #1029209: 6106 keys, 1680947 bytes 2019-03-29 07:11:50.909866 7ff6c792b700 1 leveldb: Generated table #1029210: 7799 keys, 2142782 bytes 2019-03-29 07:11:50.921143 7ff6c792b700 1 leveldb: Generated table #1029211: 5737 keys, 1574963 bytes 2019-03-29 07:11:50.923357 7ff6c792b700 1 leveldb: Generated table #1029212: 1149 keys, 310202 bytes 2019-03-29 07:11:50.923388 7ff6c792b700 1 leveldb: Compacted 1@3 + 12@4 files => 22214334 bytes 2019-03-29 07:11:50.924224 7ff6c792b700 1 leveldb: compacted to: files[ 0 3 54 715 6304 24079 0 ] 2019-03-29 07:11:50.942586 7ff6c792b700 1 leveldb: Delete type=2 #1029109 Is there a way i can skip this? in.linkedin.com/in/nikhilravindra On Fri, Mar 29, 2019 at 11:32 AM huang jun wrote: > Nikhil R 于2019年3月29日周五 下午1:44写道: > > > > if i comment filestore_split_multiple = 72 filestore_merge_threshold = > 480 in the ceph.conf wont ceph take the default value of 2 and 10 and we > would be in more splits and crashes? > > > Yes, that aimed to make it clear what results in the long start time, > leveldb compact or filestore split? > > in.linkedin.com/in/nikhilravindra > > > > > > > > On Fri, Mar 29, 2019 at 6:55 AM huang jun wrote: > >> > >> It seems like the split settings result the problem, > >> what about comment out those settings then see it still used that long > >> time to restart? > >> As a fast search in code, these two > >> filestore_split_multiple = 72 > >> filestore_merge_threshold = 480 > >> doesn't support online change. > >> > >> Nikhil R 于2019年3月28日周四 下午6:33写道: > >> > > >> > Thanks huang for the reply. > >> > Its is the disk compaction taking more time > >> > the disk i/o is completely utilized upto 100% > >> > looks like both osd_compact_leveldb_on_mount = false & > leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 > >> > is there a way to turn off compaction? > >> > > >> > Also, the reason why we are restarting osd's is due to splitting and > we increased split multiple and merge_threshold. > >> > Is there a way we would inject it? Is osd restarts the only solution? > >> > > >> > Thanks In Advance > >> > > >> > in.linkedin.com/in/nikhilravindra > >> > > >> > > >> > > >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun > wrote: > >> >> > >> >> Did the time really cost on db compact operation? > >> >> or you can turn on debug_osd=20 to see what happens, > >> >> what about the disk util during start? > >> >> > >> >> Nikhil R 于2019年3月28日周四 下午4:36写道: > >> >> > > >> >> > CEPH osd restarts are taking too long a time > >> >> > below is my ceph.conf > >> >> > [osd] > >> >> > osd_compact_leveldb_on_moun
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
Any help on this would be much appreciated as our prod is down since a day and each osd restart is taking 4-5 hours. in.linkedin.com/in/nikhilravindra On Fri, Mar 29, 2019 at 7:43 PM Nikhil R wrote: > We have maxed out the files per dir. CEPH is trying to do an online split > due to which osd's are crashing. We increased the split_multiple and > merge_threshold for now and are restarting osd's. Now on these restarts the > leveldb compaction is taking a long time. Below are some of the logs. > > 2019-03-29 06:25:37.082055 7f3c6320a8c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP > ioctl is disabled via 'filestore fiemap' config option > 2019-03-29 06:25:37.082064 7f3c6320a8c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: > SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option > 2019-03-29 06:25:37.082079 7f3c6320a8c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice > is supported > 2019-03-29 06:25:37.096658 7f3c6320a8c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: > syncfs(2) syscall fully supported (by glibc and kernel) > 2019-03-29 06:25:37.096703 7f3c6320a8c0 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is > disabled by conf > 2019-03-29 06:25:37.295577 7f3c6320a8c0 1 leveldb: Recovering log #1151738 > 2019-03-29 06:25:37.445516 7f3c6320a8c0 1 leveldb: Delete type=0 #1151738 > 2019-03-29 06:25:37.445574 7f3c6320a8c0 1 leveldb: Delete type=3 #1151737 > 2019-03-29 07:11:50.619313 7ff6c792b700 1 leveldb: Compacting 1@3 + 12@4 > files > 2019-03-29 07:11:50.639795 7ff6c792b700 1 leveldb: Generated table > #1029200: 7805 keys, 2141956 bytes > 2019-03-29 07:11:50.649315 7ff6c792b700 1 leveldb: Generated table > #1029201: 4464 keys, 1220994 bytes > 2019-03-29 07:11:50.660485 7ff6c792b700 1 leveldb: Generated table > #1029202: 7813 keys, 2142882 bytes > 2019-03-29 07:11:50.672235 7ff6c792b700 1 leveldb: Generated table > #1029203: 6283 keys, 1712810 bytes > 2019-03-29 07:11:50.697949 7ff6c792b700 1 leveldb: Generated table > #1029204: 7805 keys, 2142841 bytes > 2019-03-29 07:11:50.714648 7ff6c792b700 1 leveldb: Generated table > #1029205: 5173 keys, 1428905 bytes > 2019-03-29 07:11:50.757146 7ff6c792b700 1 leveldb: Generated table > #1029206: 7888 keys, 2143304 bytes > 2019-03-29 07:11:50.774357 7ff6c792b700 1 leveldb: Generated table > #1029207: 5168 keys, 1425634 bytes > 2019-03-29 07:11:50.830276 7ff6c792b700 1 leveldb: Generated table > #1029208: 7821 keys, 2146114 bytes > 2019-03-29 07:11:50.849116 7ff6c792b700 1 leveldb: Generated table > #1029209: 6106 keys, 1680947 bytes > 2019-03-29 07:11:50.909866 7ff6c792b700 1 leveldb: Generated table > #1029210: 7799 keys, 2142782 bytes > 2019-03-29 07:11:50.921143 7ff6c792b700 1 leveldb: Generated table > #1029211: 5737 keys, 1574963 bytes > 2019-03-29 07:11:50.923357 7ff6c792b700 1 leveldb: Generated table > #1029212: 1149 keys, 310202 bytes > 2019-03-29 07:11:50.923388 7ff6c792b700 1 leveldb: Compacted 1@3 + 12@4 > files => 22214334 bytes > 2019-03-29 07:11:50.924224 7ff6c792b700 1 leveldb: compacted to: files[ 0 > 3 54 715 6304 24079 0 ] > 2019-03-29 07:11:50.942586 7ff6c792b700 1 leveldb: Delete type=2 #1029109 > > Is there a way i can skip this? > > in.linkedin.com/in/nikhilravindra > > > > On Fri, Mar 29, 2019 at 11:32 AM huang jun wrote: > >> Nikhil R 于2019年3月29日周五 下午1:44写道: >> > >> > if i comment filestore_split_multiple = 72 filestore_merge_threshold = >> 480 in the ceph.conf wont ceph take the default value of 2 and 10 and we >> would be in more splits and crashes? >> > >> Yes, that aimed to make it clear what results in the long start time, >> leveldb compact or filestore split? >> > in.linkedin.com/in/nikhilravindra >> > >> > >> > >> > On Fri, Mar 29, 2019 at 6:55 AM huang jun wrote: >> >> >> >> It seems like the split settings result the problem, >> >> what about comment out those settings then see it still used that long >> >> time to restart? >> >> As a fast search in code, these two >> >> filestore_split_multiple = 72 >> >> filestore_merge_threshold = 480 >> >> doesn't support online change. >> >> >> >> Nikhil R 于2019年3月28日周四 下午6:33写道: >> >> > >> >> > Thanks huang for the reply. >> >> > Its is the disk compaction taking more time >> >> > the disk i/o is completely utilized upto 100% >> >> > looks like both osd_compact_leveldb_on_mount = false & >> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 >> >> > is there a way to turn off compaction? >> >> > >> >> > Also, the reason why we are restarting osd's is due to splitting and >> we increased split multiple and merge_threshold. >> >> > Is there a way we would inject it? Is osd restarts the only solution? >> >> > >> >> > Thanks In Advance >> >> > >> >> > in.linkedin.com/in/nikhilravindra >> >> > >> >> > >> >> > >> >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun >> wrote: >> >
Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9
The issue we have is large leveldb's . do we have any setting to disable compaction of leveldb on osd start? in.linkedin.com/in/nikhilravindra On Fri, Mar 29, 2019 at 7:44 PM Nikhil R wrote: > Any help on this would be much appreciated as our prod is down since a day > and each osd restart is taking 4-5 hours. > in.linkedin.com/in/nikhilravindra > > > > On Fri, Mar 29, 2019 at 7:43 PM Nikhil R wrote: > >> We have maxed out the files per dir. CEPH is trying to do an online split >> due to which osd's are crashing. We increased the split_multiple and >> merge_threshold for now and are restarting osd's. Now on these restarts the >> leveldb compaction is taking a long time. Below are some of the logs. >> >> 2019-03-29 06:25:37.082055 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2019-03-29 06:25:37.082064 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: >> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option >> 2019-03-29 06:25:37.082079 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice >> is supported >> 2019-03-29 06:25:37.096658 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2019-03-29 06:25:37.096703 7f3c6320a8c0 0 >> xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is >> disabled by conf >> 2019-03-29 06:25:37.295577 7f3c6320a8c0 1 leveldb: Recovering log >> #1151738 >> 2019-03-29 06:25:37.445516 7f3c6320a8c0 1 leveldb: Delete type=0 #1151738 >> 2019-03-29 06:25:37.445574 7f3c6320a8c0 1 leveldb: Delete type=3 #1151737 >> 2019-03-29 07:11:50.619313 7ff6c792b700 1 leveldb: Compacting 1@3 + 12@4 >> files >> 2019-03-29 07:11:50.639795 7ff6c792b700 1 leveldb: Generated table >> #1029200: 7805 keys, 2141956 bytes >> 2019-03-29 07:11:50.649315 7ff6c792b700 1 leveldb: Generated table >> #1029201: 4464 keys, 1220994 bytes >> 2019-03-29 07:11:50.660485 7ff6c792b700 1 leveldb: Generated table >> #1029202: 7813 keys, 2142882 bytes >> 2019-03-29 07:11:50.672235 7ff6c792b700 1 leveldb: Generated table >> #1029203: 6283 keys, 1712810 bytes >> 2019-03-29 07:11:50.697949 7ff6c792b700 1 leveldb: Generated table >> #1029204: 7805 keys, 2142841 bytes >> 2019-03-29 07:11:50.714648 7ff6c792b700 1 leveldb: Generated table >> #1029205: 5173 keys, 1428905 bytes >> 2019-03-29 07:11:50.757146 7ff6c792b700 1 leveldb: Generated table >> #1029206: 7888 keys, 2143304 bytes >> 2019-03-29 07:11:50.774357 7ff6c792b700 1 leveldb: Generated table >> #1029207: 5168 keys, 1425634 bytes >> 2019-03-29 07:11:50.830276 7ff6c792b700 1 leveldb: Generated table >> #1029208: 7821 keys, 2146114 bytes >> 2019-03-29 07:11:50.849116 7ff6c792b700 1 leveldb: Generated table >> #1029209: 6106 keys, 1680947 bytes >> 2019-03-29 07:11:50.909866 7ff6c792b700 1 leveldb: Generated table >> #1029210: 7799 keys, 2142782 bytes >> 2019-03-29 07:11:50.921143 7ff6c792b700 1 leveldb: Generated table >> #1029211: 5737 keys, 1574963 bytes >> 2019-03-29 07:11:50.923357 7ff6c792b700 1 leveldb: Generated table >> #1029212: 1149 keys, 310202 bytes >> 2019-03-29 07:11:50.923388 7ff6c792b700 1 leveldb: Compacted 1@3 + 12@4 >> files => 22214334 bytes >> 2019-03-29 07:11:50.924224 7ff6c792b700 1 leveldb: compacted to: files[ >> 0 3 54 715 6304 24079 0 ] >> 2019-03-29 07:11:50.942586 7ff6c792b700 1 leveldb: Delete type=2 #1029109 >> >> Is there a way i can skip this? >> >> in.linkedin.com/in/nikhilravindra >> >> >> >> On Fri, Mar 29, 2019 at 11:32 AM huang jun wrote: >> >>> Nikhil R 于2019年3月29日周五 下午1:44写道: >>> > >>> > if i comment filestore_split_multiple = 72 filestore_merge_threshold = >>> 480 in the ceph.conf wont ceph take the default value of 2 and 10 and we >>> would be in more splits and crashes? >>> > >>> Yes, that aimed to make it clear what results in the long start time, >>> leveldb compact or filestore split? >>> > in.linkedin.com/in/nikhilravindra >>> > >>> > >>> > >>> > On Fri, Mar 29, 2019 at 6:55 AM huang jun wrote: >>> >> >>> >> It seems like the split settings result the problem, >>> >> what about comment out those settings then see it still used that long >>> >> time to restart? >>> >> As a fast search in code, these two >>> >> filestore_split_multiple = 72 >>> >> filestore_merge_threshold = 480 >>> >> doesn't support online change. >>> >> >>> >> Nikhil R 于2019年3月28日周四 下午6:33写道: >>> >> > >>> >> > Thanks huang for the reply. >>> >> > Its is the disk compaction taking more time >>> >> > the disk i/o is completely utilized upto 100% >>> >> > looks like both osd_compact_leveldb_on_mount = false & >>> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 >>> >> > is there a way to turn off compaction? >>> >> > >>> >> > Also, the reason why we are restarting osd's is due to splitting
Re: [ceph-users] Bluestore WAL/DB decisions
On Fri, Mar 29, 2019 at 1:48 AM Christian Balzer wrote: > > On Fri, 29 Mar 2019 01:22:06 -0400 Erik McCormick wrote: > > > Hello all, > > > > Having dug through the documentation and reading mailing list threads > > until my eyes rolled back in my head, I am left with a conundrum > > still. Do I separate the DB / WAL or not. > > > You clearly didn't find this thread, most significant post here but read > it all: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033799.html > > In short, a 30GB DB(and thus WAL) partition should do the trick for many > use cases and will still be better than nothing. > Thanks for the link. I actually had seen it, but since it contained the mention of the 4%, and my OSDs are larger than those of the original poster there, I was still concerned that antying I could throw at it would be insufficient. I have a few OSDs that I've created with DB on the device, and this is what it ended up with after backfilling: Smallest: "db_total_bytes": 320063143936, "db_used_bytes": 1783627776, Biggest: "db_total_bytes": 320063143936, "db_used_bytes": 167883309056, So given that The biggest is ~160GB in size already, I wasn't certain if it would be better to have some with only ~20% of it split off onto an SSD, or leave it all together on the slower disk. I have a new cluster I"m building out with the same hardware, so I guess I'll see how it goes with a small DB unless anyone comes back and says it's a terrible idea ;). -Erik > Christian > > > I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs > > and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split > > the journals on the remaining SSD space. > > > > My initial minimal understanding of Bluestore was that one should > > stick the DB and WAL on an SSD, and if it filled up it would just > > spill back onto the OSD itself where it otherwise would have been > > anyway. > > > > So now I start digging and see that the minimum recommended size is 4% > > of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that > > available to me. > > > > I've also read that it's not so much the data size that matters but > > the number of objects and their size. Just looking at my current usage > > and extrapolating that to my maximum capacity, I get to ~1.44 million > > objects / OSD. > > > > So the question is, do I: > > > > 1) Put everything on the OSD and forget the SSDs exist. > > > > 2) Put just the WAL on the SSDs > > > > 3) Put the DB (and therefore the WAL) on SSD, ignore the size > > recommendations, and just give each as much space as I can. Maybe 48GB > > / OSD. > > > > 4) Some scenario I haven't considered. > > > > Is the penalty for a too small DB on an SSD partition so severe that > > it's not worth doing? > > > > Thanks, > > Erik > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Recommended fs to use with rbd
I would like to use rbd image from replicated hdd pool in a libvirt/kvm vm. 1. What is the best filesystem to use with rbd, just standaard xfs? 2. Is there a recommended tuning for lvm on how to put multiple rbd images? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-iscsi: (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object
Hi, I upgraded my test Ceph iSCSI gateways to ceph-iscsi-3.0-6.g433bbaa.el7.noarch. I'm trying to use the new parameter "cluster_client_name", which - to me - sounds like I don't have to access the ceph cluster as "client.admin" anymore. I created a "client.iscsi" user and watched what happened. The gateways can obviously read the config (which I created when I was still client.admin), but when I try to change anything (like create a new disk in pool "iscsi") I get the following error: (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object I suspect this is related to the privileges of "client.iscsi", but I couldn't find the correct settings yet. The last thing I tried was: caps: [mon] allow r, allow command "osd blacklist" caps: [osd] allow * pool=rbd, profile rbd pool=iscsi Can anybody tell me how to solve this? My Ceph version is 12.2.10 on CentOS 7. thx Matthias ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and many small files
Hi Jörn, On Fri, Mar 29, 2019 at 5:20 AM Clausen, Jörn wrote: > > Hi! > > In my ongoing quest to wrap my head around Ceph, I created a CephFS > (data and metadata pool with replicated size 3, 128 pgs each). What version? > When I > mount it on my test client, I see a usable space of ~500 GB, which I > guess is okay for the raw capacity of 1.6 TiB I have in my OSDs. > > I run bonnie with > > -s 0G -n 20480:1k:1:8192 > > i.e. I should end up with ~20 million files, each file 1k in size > maximum. After about 8 million files (about 4.7 GBytes of actual use), > my cluster runs out of space. Meaning, you got ENOSPC? > Is there something like a "block size" in CephFS? I've read > > http://docs.ceph.com/docs/master/cephfs/file-layouts/ > > and thought maybe object_size is something I can tune, but I only get > > $ setfattr -n ceph.dir.layout.object_size -v 524288 bonnie > setfattr: bonnie: Invalid argument You can only set a layout on an empty directory. The layouts here are not likely to be the cause. > Is this even the right approach? Or are "CephFS" and "many small files" > such opposing concepts that it is simply not worth the effort? You should not have had issues growing to that number of files. Please post more information about your cluster including configuration changes and `ceph osd df`. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph block storage cluster limitations
Hello, I wanted to know if there are any max limitations on - Max number of Ceph data nodes - Max number of OSDs per data node - Global max on number of OSDs - Any limitations on the size of each drive managed by OSD? - Any limitation on number of client nodes? - Any limitation on maximum number of RBD volumes that can be created? Also, any advise on using NVMes for OSD drives? What is the known maximum cluster size that Ceph RBD has been deployed to? Thanks, Shridhar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Erasure Pools.
I have tried to create erasure pools for CephFS using the examples given at https://swamireddy.wordpress.com/2016/01/26/ceph-diff-between-erasure-and-replicated-pool-type/ but this is resulting in some weird behaviour. The only number in common is that when creating the metadata store; is this related? [ceph@thor ~]$ ceph -s cluster: id: b688f541-9ad4-48fc-8060-803cb286fc38 health: HEALTH_WARN Reduced data availability: 128 pgs inactive, 128 pgs incomplete services: mon: 3 daemons, quorum thor,odin,loki mgr: odin(active), standbys: loki, thor mds: cephfs-1/1/1 up {0=thor=up:active}, 1 up:standby osd: 5 osds: 5 up, 5 in data: pools: 2 pools, 256 pgs objects: 21 objects, 2.19KiB usage: 5.08GiB used, 7.73TiB / 7.73TiB avail pgs: 50.000% pgs not active 128 creating+incomplete 128 active+clean Pretty sure these were the commands used. ceph osd pool create storage 1024 erasure ec-42-profile2 ceph osd pool create storage 128 erasure ec-42-profile2 ceph fs new cephfs storage_metadata storage ceph osd pool create storage_metadata 128 ceph fs new cephfs storage_metadata storage ceph fs add_data_pool cephfs storage ceph osd pool set storage allow_ec_overwrites true ceph osd pool application enable storage cephfs fs add_data_pool default storage ceph fs add_data_pool cephfs storage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and many small files
Are you running on HDDs? The minimum allocation size is 64kb by default here. You can control that via the parameter bluestore_min_alloc_size during OSD creation. 64 kb times 8 million files is 512 GB which is the amount of usable space you reported before running the test, so that seems to add up. There's also some metadata overhead etc. You might want to consider enabling inline data in cephfs to handle small files in a store-efficient way (note that this feature is officially marked as experimental, though). http://docs.ceph.com/docs/master/cephfs/experimental-features/#inline-data Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Mar 29, 2019 at 1:20 PM Clausen, Jörn wrote: > > Hi! > > In my ongoing quest to wrap my head around Ceph, I created a CephFS > (data and metadata pool with replicated size 3, 128 pgs each). When I > mount it on my test client, I see a usable space of ~500 GB, which I > guess is okay for the raw capacity of 1.6 TiB I have in my OSDs. > > I run bonnie with > > -s 0G -n 20480:1k:1:8192 > > i.e. I should end up with ~20 million files, each file 1k in size > maximum. After about 8 million files (about 4.7 GBytes of actual use), > my cluster runs out of space. > > Is there something like a "block size" in CephFS? I've read > > http://docs.ceph.com/docs/master/cephfs/file-layouts/ > > and thought maybe object_size is something I can tune, but I only get > > $ setfattr -n ceph.dir.layout.object_size -v 524288 bonnie > setfattr: bonnie: Invalid argument > > Is this even the right approach? Or are "CephFS" and "many small files" > such opposing concepts that it is simply not worth the effort? > > -- > Jörn Clausen > Daten- und Rechenzentrum > GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel > Düsternbrookerweg 20 > 24105 Kiel > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Samsung 983 NVMe M.2 - experiences?
Hello, I'm in the process of building a new ceph cluster, this time around i was considering going with nvme ssd drives. In searching for something in the line of 1TB per ssd drive, i found "Samsung 983 DCT 960GB NVMe M.2 Enterprise SSD for Business". More info: https://www.samsung.com/us/business/products/computing/ssd/enterprise/983-dct-960gb-mz-1lb960ne/ The idea is buy 10 units. Anyone have any thoughts/experiences with this drives? Thanks, Fabian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com