Re: [ceph-users] Bluestore WAL/DB decisions

2019-03-29 Thread Marc Roos
 
Hi Erik,

For now I have everything on the hdd's and I have some pools on just 
ssd's that require more speed. It looked to me the best way to start 
simple. I do not seem to need the iops yet to change this setup.

However I am curious about what the kind of performance increase you 
will get from moving the db/wal to ssd with spinners. So if you are able 
to, please publish some test results of the same environment from before 
and after your change.

Thanks,
Marc




-Original Message-
From: Erik McCormick [mailto:emccorm...@cirrusseven.com] 
Sent: 29 March 2019 06:22
To: ceph-users
Subject: [ceph-users] Bluestore WAL/DB decisions

Hello all,

Having dug through the documentation and reading mailing list threads 
until my eyes rolled back in my head, I am left with a conundrum still. 
Do I separate the DB / WAL or not.

I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs and 
2 x 240 GB SSDs. I had put the OS on the first SSD, and then split the 
journals on the remaining SSD space.

My initial minimal understanding of Bluestore was that one should stick 
the DB and WAL on an SSD, and if it filled up it would just spill back 
onto the OSD itself where it otherwise would have been anyway.

So now I start digging and see that the minimum recommended size is 4% 
of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that 
available to me.

I've also read that it's not so much the data size that matters but the 
number of objects and their size. Just looking at my current usage and 
extrapolating that to my maximum capacity, I get to ~1.44 million 
objects / OSD.

So the question is, do I:

1) Put everything on the OSD and forget the SSDs exist.

2) Put just the WAL on the SSDs

3) Put the DB (and therefore the WAL) on SSD, ignore the size 
recommendations, and just give each as much space as I can. Maybe 48GB / 
OSD.

4) Some scenario I haven't considered.

Is the penalty for a too small DB on an SSD partition so severe that 
it's not worth doing?

Thanks,
Erik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked ops after change from filestore on HDD to bluestore on SDD

2019-03-29 Thread Uwe Sauter
Hi,

Am 28.03.19 um 20:03 schrieb c...@elchaka.de:
> Hi Uwe,
> 
> Am 28. Februar 2019 11:02:09 MEZ schrieb Uwe Sauter :
>> Am 28.02.19 um 10:42 schrieb Matthew H:
>>> Have you made any changes to your ceph.conf? If so, would you mind
>> copying them into this thread?
>>
>> No, I just deleted an OSD, replaced HDD with SDD and created a new OSD
>> (with bluestore). Once the cluster was healty again, I
>> repeated with the next OSD.
>>
>>
>> [global]
>>  auth client required = cephx
>>  auth cluster required = cephx
>>  auth service required = cephx
>>  cluster network = 169.254.42.0/24
>>  fsid = 753c9bbd-74bd-4fea-8c1e-88da775c5ad4
>>  keyring = /etc/pve/priv/$cluster.$name.keyring
>>  public network = 169.254.42.0/24
>>
>> [mon]
>>  mon allow pool delete = true
>>  mon data avail crit = 5
>>  mon data avail warn = 15
>>
>> [osd]
>>  keyring = /var/lib/ceph/osd/ceph-$id/keyring
>>  osd journal size = 5120
>>  osd pool default min size = 2
>>  osd pool default size = 3
>>  osd max backfills = 6
>>  osd recovery max active = 12
> 
> I guess should decrease  this last two  parameters to 1. This should help to 
> avoid to much pressure on your drives...
> 

Unlikely to help as no recovery / backfilling is running when the situation 
appears.

> Hth
> - Mehmet 
> 
>>
>> [mon.px-golf-cluster]
>>  host = px-golf-cluster
>>  mon addr = 169.254.42.54:6789
>>
>> [mon.px-hotel-cluster]
>>  host = px-hotel-cluster
>>  mon addr = 169.254.42.55:6789
>>
>> [mon.px-india-cluster]
>>  host = px-india-cluster
>>  mon addr = 169.254.42.56:6789
>>
>>
>>
>>
>>>
>>>
>> --
>>> *From:* ceph-users  on behalf of
>> Vitaliy Filippov 
>>> *Sent:* Wednesday, February 27, 2019 4:21 PM
>>> *To:* Ceph Users
>>> *Subject:* Re: [ceph-users] Blocked ops after change from filestore
>> on HDD to bluestore on SDD
>>>  
>>> I think this should not lead to blocked ops in any case, even if the 
>>> performance is low...
>>>
>>> -- 
>>> With best regards,
>>>    Vitaliy Filippov
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS and many small files

2019-03-29 Thread Clausen , Jörn

Hi!

In my ongoing quest to wrap my head around Ceph, I created a CephFS 
(data and metadata pool with replicated size 3, 128 pgs each). When I 
mount it on my test client, I see a usable space of ~500 GB, which I 
guess is okay for the raw capacity of 1.6 TiB I have in my OSDs.


I run bonnie with

-s 0G -n 20480:1k:1:8192

i.e. I should end up with ~20 million files, each file 1k in size 
maximum. After about 8 million files (about 4.7 GBytes of actual use), 
my cluster runs out of space.


Is there something like a "block size" in CephFS? I've read

http://docs.ceph.com/docs/master/cephfs/file-layouts/

and thought maybe object_size is something I can tune, but I only get

$ setfattr -n ceph.dir.layout.object_size -v 524288 bonnie
setfattr: bonnie: Invalid argument

Is this even the right approach? Or are "CephFS" and "many small files" 
such opposing concepts that it is simply not worth the effort?


--
Jörn Clausen
Daten- und Rechenzentrum
GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel
Düsternbrookerweg 20
24105 Kiel





smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-29 Thread Nikhil R
We have maxed out the files per dir. CEPH is trying to do an online split
due to which osd's are crashing. We increased the split_multiple and
merge_threshold for now and are restarting osd's. Now on these restarts the
leveldb compaction is taking a long time. Below are some of the logs.

2019-03-29 06:25:37.082055 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP
ioctl is disabled via 'filestore fiemap' config option
2019-03-29 06:25:37.082064 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2019-03-29 06:25:37.082079 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice
is supported
2019-03-29 06:25:37.096658 7f3c6320a8c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2019-03-29 06:25:37.096703 7f3c6320a8c0  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is
disabled by conf
2019-03-29 06:25:37.295577 7f3c6320a8c0  1 leveldb: Recovering log #1151738
2019-03-29 06:25:37.445516 7f3c6320a8c0  1 leveldb: Delete type=0 #1151738
2019-03-29 06:25:37.445574 7f3c6320a8c0  1 leveldb: Delete type=3 #1151737
2019-03-29 07:11:50.619313 7ff6c792b700  1 leveldb: Compacting 1@3 + 12@4
files
2019-03-29 07:11:50.639795 7ff6c792b700  1 leveldb: Generated table
#1029200: 7805 keys, 2141956 bytes
2019-03-29 07:11:50.649315 7ff6c792b700  1 leveldb: Generated table
#1029201: 4464 keys, 1220994 bytes
2019-03-29 07:11:50.660485 7ff6c792b700  1 leveldb: Generated table
#1029202: 7813 keys, 2142882 bytes
2019-03-29 07:11:50.672235 7ff6c792b700  1 leveldb: Generated table
#1029203: 6283 keys, 1712810 bytes
2019-03-29 07:11:50.697949 7ff6c792b700  1 leveldb: Generated table
#1029204: 7805 keys, 2142841 bytes
2019-03-29 07:11:50.714648 7ff6c792b700  1 leveldb: Generated table
#1029205: 5173 keys, 1428905 bytes
2019-03-29 07:11:50.757146 7ff6c792b700  1 leveldb: Generated table
#1029206: 7888 keys, 2143304 bytes
2019-03-29 07:11:50.774357 7ff6c792b700  1 leveldb: Generated table
#1029207: 5168 keys, 1425634 bytes
2019-03-29 07:11:50.830276 7ff6c792b700  1 leveldb: Generated table
#1029208: 7821 keys, 2146114 bytes
2019-03-29 07:11:50.849116 7ff6c792b700  1 leveldb: Generated table
#1029209: 6106 keys, 1680947 bytes
2019-03-29 07:11:50.909866 7ff6c792b700  1 leveldb: Generated table
#1029210: 7799 keys, 2142782 bytes
2019-03-29 07:11:50.921143 7ff6c792b700  1 leveldb: Generated table
#1029211: 5737 keys, 1574963 bytes
2019-03-29 07:11:50.923357 7ff6c792b700  1 leveldb: Generated table
#1029212: 1149 keys, 310202 bytes
2019-03-29 07:11:50.923388 7ff6c792b700  1 leveldb: Compacted 1@3 + 12@4
files => 22214334 bytes
2019-03-29 07:11:50.924224 7ff6c792b700  1 leveldb: compacted to: files[ 0
3 54 715 6304 24079 0 ]
2019-03-29 07:11:50.942586 7ff6c792b700  1 leveldb: Delete type=2 #1029109

Is there a way i can skip this?

in.linkedin.com/in/nikhilravindra



On Fri, Mar 29, 2019 at 11:32 AM huang jun  wrote:

> Nikhil R  于2019年3月29日周五 下午1:44写道:
> >
> > if i comment filestore_split_multiple = 72 filestore_merge_threshold =
> 480   in the ceph.conf wont ceph take the default value of 2 and 10 and we
> would be in more splits and crashes?
> >
> Yes, that aimed to make it clear what results in the long start time,
> leveldb compact or filestore split?
> > in.linkedin.com/in/nikhilravindra
> >
> >
> >
> > On Fri, Mar 29, 2019 at 6:55 AM huang jun  wrote:
> >>
> >> It seems like the split settings result the problem,
> >> what about comment out those settings then see it still used that long
> >> time to restart?
> >> As a fast search in code, these two
> >> filestore_split_multiple = 72
> >> filestore_merge_threshold = 480
> >> doesn't support online change.
> >>
> >> Nikhil R  于2019年3月28日周四 下午6:33写道:
> >> >
> >> > Thanks huang for the reply.
> >> > Its is the disk compaction taking more time
> >> > the disk i/o is completely utilized upto 100%
> >> > looks like both osd_compact_leveldb_on_mount = false &
> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
> >> > is there a way to turn off compaction?
> >> >
> >> > Also, the reason why we are restarting osd's is due to splitting and
> we increased split multiple and merge_threshold.
> >> > Is there a way we would inject it? Is osd restarts the only solution?
> >> >
> >> > Thanks In Advance
> >> >
> >> > in.linkedin.com/in/nikhilravindra
> >> >
> >> >
> >> >
> >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun 
> wrote:
> >> >>
> >> >> Did the time really cost on db compact operation?
> >> >> or you can turn on debug_osd=20 to see what happens,
> >> >> what about the disk util during start?
> >> >>
> >> >> Nikhil R  于2019年3月28日周四 下午4:36写道:
> >> >> >
> >> >> > CEPH osd restarts are taking too long a time
> >> >> > below is my ceph.conf
> >> >> > [osd]
> >> >> > osd_compact_leveldb_on_moun

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-29 Thread Nikhil R
Any help on this would be much appreciated as our prod is down since a day
and each osd restart is taking 4-5 hours.
in.linkedin.com/in/nikhilravindra



On Fri, Mar 29, 2019 at 7:43 PM Nikhil R  wrote:

> We have maxed out the files per dir. CEPH is trying to do an online split
> due to which osd's are crashing. We increased the split_multiple and
> merge_threshold for now and are restarting osd's. Now on these restarts the
> leveldb compaction is taking a long time. Below are some of the logs.
>
> 2019-03-29 06:25:37.082055 7f3c6320a8c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2019-03-29 06:25:37.082064 7f3c6320a8c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2019-03-29 06:25:37.082079 7f3c6320a8c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice
> is supported
> 2019-03-29 06:25:37.096658 7f3c6320a8c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2019-03-29 06:25:37.096703 7f3c6320a8c0  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is
> disabled by conf
> 2019-03-29 06:25:37.295577 7f3c6320a8c0  1 leveldb: Recovering log #1151738
> 2019-03-29 06:25:37.445516 7f3c6320a8c0  1 leveldb: Delete type=0 #1151738
> 2019-03-29 06:25:37.445574 7f3c6320a8c0  1 leveldb: Delete type=3 #1151737
> 2019-03-29 07:11:50.619313 7ff6c792b700  1 leveldb: Compacting 1@3 + 12@4
> files
> 2019-03-29 07:11:50.639795 7ff6c792b700  1 leveldb: Generated table
> #1029200: 7805 keys, 2141956 bytes
> 2019-03-29 07:11:50.649315 7ff6c792b700  1 leveldb: Generated table
> #1029201: 4464 keys, 1220994 bytes
> 2019-03-29 07:11:50.660485 7ff6c792b700  1 leveldb: Generated table
> #1029202: 7813 keys, 2142882 bytes
> 2019-03-29 07:11:50.672235 7ff6c792b700  1 leveldb: Generated table
> #1029203: 6283 keys, 1712810 bytes
> 2019-03-29 07:11:50.697949 7ff6c792b700  1 leveldb: Generated table
> #1029204: 7805 keys, 2142841 bytes
> 2019-03-29 07:11:50.714648 7ff6c792b700  1 leveldb: Generated table
> #1029205: 5173 keys, 1428905 bytes
> 2019-03-29 07:11:50.757146 7ff6c792b700  1 leveldb: Generated table
> #1029206: 7888 keys, 2143304 bytes
> 2019-03-29 07:11:50.774357 7ff6c792b700  1 leveldb: Generated table
> #1029207: 5168 keys, 1425634 bytes
> 2019-03-29 07:11:50.830276 7ff6c792b700  1 leveldb: Generated table
> #1029208: 7821 keys, 2146114 bytes
> 2019-03-29 07:11:50.849116 7ff6c792b700  1 leveldb: Generated table
> #1029209: 6106 keys, 1680947 bytes
> 2019-03-29 07:11:50.909866 7ff6c792b700  1 leveldb: Generated table
> #1029210: 7799 keys, 2142782 bytes
> 2019-03-29 07:11:50.921143 7ff6c792b700  1 leveldb: Generated table
> #1029211: 5737 keys, 1574963 bytes
> 2019-03-29 07:11:50.923357 7ff6c792b700  1 leveldb: Generated table
> #1029212: 1149 keys, 310202 bytes
> 2019-03-29 07:11:50.923388 7ff6c792b700  1 leveldb: Compacted 1@3 + 12@4
> files => 22214334 bytes
> 2019-03-29 07:11:50.924224 7ff6c792b700  1 leveldb: compacted to: files[ 0
> 3 54 715 6304 24079 0 ]
> 2019-03-29 07:11:50.942586 7ff6c792b700  1 leveldb: Delete type=2 #1029109
>
> Is there a way i can skip this?
>
> in.linkedin.com/in/nikhilravindra
>
>
>
> On Fri, Mar 29, 2019 at 11:32 AM huang jun  wrote:
>
>> Nikhil R  于2019年3月29日周五 下午1:44写道:
>> >
>> > if i comment filestore_split_multiple = 72 filestore_merge_threshold =
>> 480   in the ceph.conf wont ceph take the default value of 2 and 10 and we
>> would be in more splits and crashes?
>> >
>> Yes, that aimed to make it clear what results in the long start time,
>> leveldb compact or filestore split?
>> > in.linkedin.com/in/nikhilravindra
>> >
>> >
>> >
>> > On Fri, Mar 29, 2019 at 6:55 AM huang jun  wrote:
>> >>
>> >> It seems like the split settings result the problem,
>> >> what about comment out those settings then see it still used that long
>> >> time to restart?
>> >> As a fast search in code, these two
>> >> filestore_split_multiple = 72
>> >> filestore_merge_threshold = 480
>> >> doesn't support online change.
>> >>
>> >> Nikhil R  于2019年3月28日周四 下午6:33写道:
>> >> >
>> >> > Thanks huang for the reply.
>> >> > Its is the disk compaction taking more time
>> >> > the disk i/o is completely utilized upto 100%
>> >> > looks like both osd_compact_leveldb_on_mount = false &
>> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>> >> > is there a way to turn off compaction?
>> >> >
>> >> > Also, the reason why we are restarting osd's is due to splitting and
>> we increased split multiple and merge_threshold.
>> >> > Is there a way we would inject it? Is osd restarts the only solution?
>> >> >
>> >> > Thanks In Advance
>> >> >
>> >> > in.linkedin.com/in/nikhilravindra
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun 
>> wrote:
>> >

Re: [ceph-users] CEPH OSD Restarts taking too long v10.2.9

2019-03-29 Thread Nikhil R
The issue we have is large leveldb's . do we have any setting to disable
compaction of leveldb on osd start?
in.linkedin.com/in/nikhilravindra



On Fri, Mar 29, 2019 at 7:44 PM Nikhil R  wrote:

> Any help on this would be much appreciated as our prod is down since a day
> and each osd restart is taking 4-5 hours.
> in.linkedin.com/in/nikhilravindra
>
>
>
> On Fri, Mar 29, 2019 at 7:43 PM Nikhil R  wrote:
>
>> We have maxed out the files per dir. CEPH is trying to do an online split
>> due to which osd's are crashing. We increased the split_multiple and
>> merge_threshold for now and are restarting osd's. Now on these restarts the
>> leveldb compaction is taking a long time. Below are some of the logs.
>>
>> 2019-03-29 06:25:37.082055 7f3c6320a8c0  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP
>> ioctl is disabled via 'filestore fiemap' config option
>> 2019-03-29 06:25:37.082064 7f3c6320a8c0  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
>> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2019-03-29 06:25:37.082079 7f3c6320a8c0  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice
>> is supported
>> 2019-03-29 06:25:37.096658 7f3c6320a8c0  0
>> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features:
>> syncfs(2) syscall fully supported (by glibc and kernel)
>> 2019-03-29 06:25:37.096703 7f3c6320a8c0  0
>> xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is
>> disabled by conf
>> 2019-03-29 06:25:37.295577 7f3c6320a8c0  1 leveldb: Recovering log
>> #1151738
>> 2019-03-29 06:25:37.445516 7f3c6320a8c0  1 leveldb: Delete type=0 #1151738
>> 2019-03-29 06:25:37.445574 7f3c6320a8c0  1 leveldb: Delete type=3 #1151737
>> 2019-03-29 07:11:50.619313 7ff6c792b700  1 leveldb: Compacting 1@3 + 12@4
>> files
>> 2019-03-29 07:11:50.639795 7ff6c792b700  1 leveldb: Generated table
>> #1029200: 7805 keys, 2141956 bytes
>> 2019-03-29 07:11:50.649315 7ff6c792b700  1 leveldb: Generated table
>> #1029201: 4464 keys, 1220994 bytes
>> 2019-03-29 07:11:50.660485 7ff6c792b700  1 leveldb: Generated table
>> #1029202: 7813 keys, 2142882 bytes
>> 2019-03-29 07:11:50.672235 7ff6c792b700  1 leveldb: Generated table
>> #1029203: 6283 keys, 1712810 bytes
>> 2019-03-29 07:11:50.697949 7ff6c792b700  1 leveldb: Generated table
>> #1029204: 7805 keys, 2142841 bytes
>> 2019-03-29 07:11:50.714648 7ff6c792b700  1 leveldb: Generated table
>> #1029205: 5173 keys, 1428905 bytes
>> 2019-03-29 07:11:50.757146 7ff6c792b700  1 leveldb: Generated table
>> #1029206: 7888 keys, 2143304 bytes
>> 2019-03-29 07:11:50.774357 7ff6c792b700  1 leveldb: Generated table
>> #1029207: 5168 keys, 1425634 bytes
>> 2019-03-29 07:11:50.830276 7ff6c792b700  1 leveldb: Generated table
>> #1029208: 7821 keys, 2146114 bytes
>> 2019-03-29 07:11:50.849116 7ff6c792b700  1 leveldb: Generated table
>> #1029209: 6106 keys, 1680947 bytes
>> 2019-03-29 07:11:50.909866 7ff6c792b700  1 leveldb: Generated table
>> #1029210: 7799 keys, 2142782 bytes
>> 2019-03-29 07:11:50.921143 7ff6c792b700  1 leveldb: Generated table
>> #1029211: 5737 keys, 1574963 bytes
>> 2019-03-29 07:11:50.923357 7ff6c792b700  1 leveldb: Generated table
>> #1029212: 1149 keys, 310202 bytes
>> 2019-03-29 07:11:50.923388 7ff6c792b700  1 leveldb: Compacted 1@3 + 12@4
>> files => 22214334 bytes
>> 2019-03-29 07:11:50.924224 7ff6c792b700  1 leveldb: compacted to: files[
>> 0 3 54 715 6304 24079 0 ]
>> 2019-03-29 07:11:50.942586 7ff6c792b700  1 leveldb: Delete type=2 #1029109
>>
>> Is there a way i can skip this?
>>
>> in.linkedin.com/in/nikhilravindra
>>
>>
>>
>> On Fri, Mar 29, 2019 at 11:32 AM huang jun  wrote:
>>
>>> Nikhil R  于2019年3月29日周五 下午1:44写道:
>>> >
>>> > if i comment filestore_split_multiple = 72 filestore_merge_threshold =
>>> 480   in the ceph.conf wont ceph take the default value of 2 and 10 and we
>>> would be in more splits and crashes?
>>> >
>>> Yes, that aimed to make it clear what results in the long start time,
>>> leveldb compact or filestore split?
>>> > in.linkedin.com/in/nikhilravindra
>>> >
>>> >
>>> >
>>> > On Fri, Mar 29, 2019 at 6:55 AM huang jun  wrote:
>>> >>
>>> >> It seems like the split settings result the problem,
>>> >> what about comment out those settings then see it still used that long
>>> >> time to restart?
>>> >> As a fast search in code, these two
>>> >> filestore_split_multiple = 72
>>> >> filestore_merge_threshold = 480
>>> >> doesn't support online change.
>>> >>
>>> >> Nikhil R  于2019年3月28日周四 下午6:33写道:
>>> >> >
>>> >> > Thanks huang for the reply.
>>> >> > Its is the disk compaction taking more time
>>> >> > the disk i/o is completely utilized upto 100%
>>> >> > looks like both osd_compact_leveldb_on_mount = false &
>>> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9
>>> >> > is there a way to turn off compaction?
>>> >> >
>>> >> > Also, the reason why we are restarting osd's is due to splitting

Re: [ceph-users] Bluestore WAL/DB decisions

2019-03-29 Thread Erik McCormick
On Fri, Mar 29, 2019 at 1:48 AM Christian Balzer  wrote:
>
> On Fri, 29 Mar 2019 01:22:06 -0400 Erik McCormick wrote:
>
> > Hello all,
> >
> > Having dug through the documentation and reading mailing list threads
> > until my eyes rolled back in my head, I am left with a conundrum
> > still. Do I separate the DB / WAL or not.
> >
> You clearly didn't find this thread, most significant post here but read
> it all:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033799.html
>
> In short, a 30GB DB(and thus WAL) partition should do the trick for many
> use cases and will still be better than nothing.
>

Thanks for the link. I actually had seen it, but since it contained
the mention of the 4%, and my OSDs are larger than those of the
original poster there, I was still concerned that antying I could
throw at it would be insufficient. I have a few OSDs that I've created
with DB on the device, and this is what it ended up with after
backfilling:

Smallest:
"db_total_bytes": 320063143936,
"db_used_bytes": 1783627776,

Biggest:
"db_total_bytes": 320063143936,
"db_used_bytes": 167883309056,

So given that The biggest is ~160GB in size already, I wasn't certain
if it would be better to have some with only ~20% of it split off onto
an SSD, or leave it all together on the slower disk. I have a new
cluster I"m building out with the same hardware, so I guess I'll see
how it goes with a small DB unless anyone comes back and says it's a
terrible idea ;).

-Erik

> Christian
>
> > I had a bunch of nodes running filestore with 8 x 8TB spinning OSDs
> > and 2 x 240 GB SSDs. I had put the OS on the first SSD, and then split
> > the journals on the remaining SSD space.
> >
> > My initial minimal understanding of Bluestore was that one should
> > stick the DB and WAL on an SSD, and if it filled up it would just
> > spill back onto the OSD itself where it otherwise would have been
> > anyway.
> >
> > So now I start digging and see that the minimum recommended size is 4%
> > of OSD size. For me that's ~2.6 TB of SSD. Clearly I do not have that
> > available to me.
> >
> > I've also read that it's not so much the data size that matters but
> > the number of objects and their size. Just looking at my current usage
> > and extrapolating that to my maximum capacity, I get to ~1.44 million
> > objects / OSD.
> >
> > So the question is, do I:
> >
> > 1) Put everything on the OSD and forget the SSDs exist.
> >
> > 2) Put just the WAL on the SSDs
> >
> > 3) Put the DB (and therefore the WAL) on SSD, ignore the size
> > recommendations, and just give each as much space as I can. Maybe 48GB
> > / OSD.
> >
> > 4) Some scenario I haven't considered.
> >
> > Is the penalty for a too small DB on an SSD partition so severe that
> > it's not worth doing?
> >
> > Thanks,
> > Erik
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recommended fs to use with rbd

2019-03-29 Thread Marc Roos


I would like to use rbd image from replicated hdd pool in a libvirt/kvm 
vm. 

1. What is the best filesystem to use with rbd, just standaard xfs?
2. Is there a recommended tuning for lvm on how to put multiple rbd 
images?






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-iscsi: (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object

2019-03-29 Thread Matthias Leopold

Hi,

I upgraded my test Ceph iSCSI gateways to 
ceph-iscsi-3.0-6.g433bbaa.el7.noarch.
I'm trying to use the new parameter "cluster_client_name", which - to me 
- sounds like I don't have to access the ceph cluster as "client.admin" 
anymore. I created a "client.iscsi" user and watched what happened. The 
gateways can obviously read the config (which I created when I was still 
client.admin), but when I try to change anything (like create a new disk 
in pool "iscsi") I get the following error:


(Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object

I suspect this is related to the privileges of "client.iscsi", but I 
couldn't find the correct settings yet. The last thing I tried was:


caps: [mon] allow r, allow command "osd blacklist"
caps: [osd] allow * pool=rbd, profile rbd pool=iscsi

Can anybody tell me how to solve this?
My Ceph version is 12.2.10 on CentOS 7.

thx
Matthias
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and many small files

2019-03-29 Thread Patrick Donnelly
Hi Jörn,

On Fri, Mar 29, 2019 at 5:20 AM Clausen, Jörn  wrote:
>
> Hi!
>
> In my ongoing quest to wrap my head around Ceph, I created a CephFS
> (data and metadata pool with replicated size 3, 128 pgs each).

What version?

> When I
> mount it on my test client, I see a usable space of ~500 GB, which I
> guess is okay for the raw capacity of 1.6 TiB I have in my OSDs.
>
> I run bonnie with
>
> -s 0G -n 20480:1k:1:8192
>
> i.e. I should end up with ~20 million files, each file 1k in size
> maximum. After about 8 million files (about 4.7 GBytes of actual use),
> my cluster runs out of space.

Meaning, you got ENOSPC?

> Is there something like a "block size" in CephFS? I've read
>
> http://docs.ceph.com/docs/master/cephfs/file-layouts/
>
> and thought maybe object_size is something I can tune, but I only get
>
> $ setfattr -n ceph.dir.layout.object_size -v 524288 bonnie
> setfattr: bonnie: Invalid argument

You can only set a layout on an empty directory. The layouts here are
not likely to be the cause.

> Is this even the right approach? Or are "CephFS" and "many small files"
> such opposing concepts that it is simply not worth the effort?

You should not have had issues growing to that number of files. Please
post more information about your cluster including configuration
changes and `ceph osd df`.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph block storage cluster limitations

2019-03-29 Thread Void Star Nill
Hello,

I wanted to know if there are any max limitations on

- Max number of Ceph data nodes
- Max number of OSDs per data node
- Global max on number of OSDs
- Any limitations on the size of each drive managed by OSD?
- Any limitation on number of client nodes?
- Any limitation on maximum number of RBD volumes that can be created?

Also, any advise on using NVMes for OSD drives?

What is the known maximum cluster size that Ceph RBD has been deployed to?

Thanks,
Shridhar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure Pools.

2019-03-29 Thread Andrew J. Hutton
I have tried to create erasure pools for CephFS using the examples given 
at 
https://swamireddy.wordpress.com/2016/01/26/ceph-diff-between-erasure-and-replicated-pool-type/ 
but this is resulting in some weird behaviour.  The only number in 
common is that when creating the metadata store; is this related?


[ceph@thor ~]$ ceph -s
  cluster:
    id: b688f541-9ad4-48fc-8060-803cb286fc38
    health: HEALTH_WARN
    Reduced data availability: 128 pgs inactive, 128 pgs incomplete

  services:
    mon: 3 daemons, quorum thor,odin,loki
    mgr: odin(active), standbys: loki, thor
    mds: cephfs-1/1/1 up  {0=thor=up:active}, 1 up:standby
    osd: 5 osds: 5 up, 5 in

  data:
    pools:   2 pools, 256 pgs
    objects: 21 objects, 2.19KiB
    usage:   5.08GiB used, 7.73TiB / 7.73TiB avail
    pgs: 50.000% pgs not active
 128 creating+incomplete
 128 active+clean

Pretty sure these were the commands used.

ceph osd pool create storage 1024 erasure ec-42-profile2
ceph osd pool create storage 128 erasure ec-42-profile2
ceph fs new cephfs storage_metadata storage
ceph osd pool create storage_metadata 128
ceph fs new cephfs storage_metadata storage
ceph fs add_data_pool cephfs storage
ceph osd pool set storage allow_ec_overwrites true
ceph osd pool application enable storage cephfs
fs add_data_pool default storage
ceph fs add_data_pool cephfs storage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and many small files

2019-03-29 Thread Paul Emmerich
Are you running on HDDs? The minimum allocation size is 64kb by
default here. You can control that via the parameter
bluestore_min_alloc_size during OSD creation.
64 kb times 8 million files is 512 GB which is the amount of usable
space you reported before running the test, so that seems to add up.

There's also some metadata overhead etc. You might want to consider
enabling inline data in cephfs to handle small files in a
store-efficient way (note that this feature is officially marked as
experimental, though).
http://docs.ceph.com/docs/master/cephfs/experimental-features/#inline-data

Paul



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Mar 29, 2019 at 1:20 PM Clausen, Jörn  wrote:
>
> Hi!
>
> In my ongoing quest to wrap my head around Ceph, I created a CephFS
> (data and metadata pool with replicated size 3, 128 pgs each). When I
> mount it on my test client, I see a usable space of ~500 GB, which I
> guess is okay for the raw capacity of 1.6 TiB I have in my OSDs.
>
> I run bonnie with
>
> -s 0G -n 20480:1k:1:8192
>
> i.e. I should end up with ~20 million files, each file 1k in size
> maximum. After about 8 million files (about 4.7 GBytes of actual use),
> my cluster runs out of space.
>
> Is there something like a "block size" in CephFS? I've read
>
> http://docs.ceph.com/docs/master/cephfs/file-layouts/
>
> and thought maybe object_size is something I can tune, but I only get
>
> $ setfattr -n ceph.dir.layout.object_size -v 524288 bonnie
> setfattr: bonnie: Invalid argument
>
> Is this even the right approach? Or are "CephFS" and "many small files"
> such opposing concepts that it is simply not worth the effort?
>
> --
> Jörn Clausen
> Daten- und Rechenzentrum
> GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel
> Düsternbrookerweg 20
> 24105 Kiel
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Samsung 983 NVMe M.2 - experiences?

2019-03-29 Thread Fabian Figueredo
Hello,
I'm in the process of building a new ceph cluster, this time around i
was considering going with nvme ssd drives.
In searching for something in the line of 1TB per ssd drive, i found
"Samsung 983 DCT 960GB NVMe M.2 Enterprise SSD for Business".

More info: 
https://www.samsung.com/us/business/products/computing/ssd/enterprise/983-dct-960gb-mz-1lb960ne/

The idea is buy 10 units.

Anyone have any thoughts/experiences with this drives?

Thanks,
Fabian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com