If you are on the current release of Ceph Hammer 0.94.10 or Jewel 10.2.7, you have it already. I don't remember which release it came out in, but it's definitely in the current releases..
On Thu, May 11, 2017, 12:24 AM Anton Dmitriev <t...@enumnet.ru> wrote: > "recent enough version of the ceph-objectstore-tool" - sounds very > interesting. Would it be released in one of next Jewel minor releases? > > > On 10.05.2017 19:03, David Turner wrote: > > PG subfolder splitting is the primary reason people are going to be > deploying Luminous and Bluestore much faster than any other major release > of Ceph. Bluestore removes the concept of subfolders in PGs. > > I have had clusters that reached what seemed a hardcoded maximum of 12,800 > objects in a subfolder. It would take an osd_heartbeat_grace of 240 or 300 > to let them finish splitting their subfolders without being marked down. > Recently I came across a cluster that had a setting of 240 objects per > subfolder before splitting, so it was splitting all the time, and several > of the OSDs took longer than 30 seconds to finish splitting into > subfolders. That led to more problems as we started adding backfilling to > everything and we lost a significant amount of throughput on the cluster. > > I have yet to manage a cluster with a recent enough version of the > ceph-objectstore-tool (hopefully I'll have one this month) that includes > the ability to take an osd offline, split the subfolders, then bring it > back online. If you set up a way to monitor how big your subfolders are > getting, you can leave the ceph settings as high as you want, and then go > in and perform maintenance on your cluster 1 failure domain at a time > splitting all of the PG subfolders on the OSDs. This approach would remove > this ever happening in the wild. > > On Wed, May 10, 2017 at 5:37 AM Piotr Nowosielski < > piotr.nowosiel...@allegrogroup.com> wrote: > >> It is difficult for me to clearly state why some PGs have not been >> migrated. >> crushmap settings? Weight of OSD? >> >> One thing is certain - you will not find any information about the split >> process in the logs ... >> >> pn >> >> -----Original Message----- >> From: Anton Dmitriev [mailto:t...@enumnet.ru] >> Sent: Wednesday, May 10, 2017 10:14 AM >> To: Piotr Nowosielski <piotr.nowosiel...@allegrogroup.com>; >> ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] All OSD fails after few requests to RGW >> >> When I created cluster, I made a mistake in configuration, and set split >> parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder. >> After that I changed split to 8, and increased number of pg and pgp from >> 2048 to 4096 for pool, where problem occurs. While it was backfilling I >> observed, that placement groups were backfilling from one set of 3 OSD to >> another set of 3 OSD (replicated size = 3), so I made a conclusion, that >> PGs >> are completely recreating while increasing PG and PGP for pool and after >> this process number of files per directory must be Ok. But when >> backfilling >> finished I found many directories in this pool with ~20 >> 000 files. Why Increasing PG num did not helped? Or maybe after this >> process >> some files will be deleted with some delay? >> >> I couldn`t find any information about directory split process in logs, >> also >> with osd and filestore debug 20. What pattern and in what log I need to >> grep >> for finding it? >> >> On 10.05.2017 10:36, Piotr Nowosielski wrote: >> > You can: >> > - change these parameters and use ceph-objectstore-tool >> > - add OSD host - rebuild the cluster will reduce the number of files >> > in the directories >> > - wait until "split" operations are over ;-) >> > >> > In our case, we could afford to wait until the "split" operation is >> > over (we have 2 clusters in slightly different configurations storing >> > the same data) >> > >> > hint: >> > When creating a new pool, use the parameter "expected_num_objects" >> > https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_ >> > pools_operate.html >> > >> > Piotr Nowosielski >> > Senior Systems Engineer >> > Zespół Infrastruktury 5 >> > Grupa Allegro sp. z o.o. >> > Tel: +48 512 08 55 92 >> > >> > >> > -----Original Message----- >> > From: Anton Dmitriev [mailto:t...@enumnet.ru] >> > Sent: Wednesday, May 10, 2017 9:19 AM >> > To: Piotr Nowosielski <piotr.nowosiel...@allegrogroup.com>; >> > ceph-users@lists.ceph.com >> > Subject: Re: [ceph-users] All OSD fails after few requests to RGW >> > >> > How did you solved it? Set new split/merge thresholds, and manually >> > applied it by ceph-objectstore-tool --data-path >> > /var/lib/ceph/osd/ceph-${osd_num} --journal-path >> > /var/lib/ceph/osd/ceph-${osd_num}/journal >> > --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op >> > apply-layout-settings --pool default.rgw.buckets.data >> > >> > on each OSD? >> > >> > How I can see in logs, that split occurs? >> > >> > On 10.05.2017 10:13, Piotr Nowosielski wrote: >> >> Hey, >> >> We had similar problems. Look for information on "Filestore merge and >> >> split". >> >> >> >> Some explain: >> >> The OSD, after reaching a certain number of files in the directory >> >> (it depends of 'filestore merge threshold' and 'filestore split >> multiple' >> >> parameters) rebuilds the structure of this directory. >> >> If the files arrives, the OSD creates new subdirectories and moves >> >> some of the files there. >> >> If the files are missing the OSD will reduce the number of >> >> subdirectories. >> >> >> >> >> >> -- >> >> Piotr Nowosielski >> >> Senior Systems Engineer >> >> Zespół Infrastruktury 5 >> >> Grupa Allegro sp. z o.o. >> >> Tel: +48 512 08 55 92 >> >> >> >> Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul. >> >> Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego >> >> przez Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII >> >> Gospodarczy Krajowego Rejestru Sądowego pod numerem KRS 0000268796, o >> >> kapitale zakładowym w wysokości 33 976 500,00 zł, posiadająca numer >> >> identyfikacji podatkowej NIP: 5272525995. >> >> >> >> >> >> >> >> -----Original Message----- >> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >> >> Of Anton Dmitriev >> >> Sent: Wednesday, May 10, 2017 8:14 AM >> >> To: ceph-users@lists.ceph.com >> >> Subject: Re: [ceph-users] All OSD fails after few requests to RGW >> >> >> >> Hi! >> >> >> >> I increased pg_num and pgp_num for pool default.rgw.buckets.data from >> >> 2048 to 4096, and it seems that situation became a bit better, >> >> cluster dies after 20-30 PUTs, not after 1. Could someone please give >> >> me some recommendations how to rescue the cluster? >> >> >> >> On 27.04.2017 09:59, Anton Dmitriev wrote: >> >>> Cluster was going well for a long time, but on the previous week >> >>> osds start to fail. >> >>> We use cluster like image storage for Opennebula with small load and >> >>> like object storage with high load. >> >>> Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz >> >>> over 1000, while reading or writing a few kilobytes in a second, >> >>> osds on this disks become unresponsive and cluster marks them down. >> >>> We lower the load to object storage and situation became better. >> >>> >> >>> Yesterday situation became worse: >> >>> If RGWs are disabled and there is no requests to object storage >> >>> cluster performing well, but if enable RGWs and make a few PUTs or >> >>> GETs all not SSD osds on all storages become in the same situation, >> >>> described above. >> >>> IOtop shows, that xfsaild/<disk> burns disks. >> >>> >> >>> trace-cmd record -e xfs\* for a 10 seconds shows 10 milion objects, >> >>> as i understand it means ~360 000 objects to push per one osd for a >> >>> 10 seconds >> >>> $ wc -l t.t >> >>> 10256873 t.t >> >>> >> >>> fragmentation on one of such disks is about 3% >> >>> >> >>> more information about cluster: >> >>> >> >>> https://yadi.sk/d/Y63mXQhl3HPvwt >> >>> >> >>> also debug logs for osd.33 while problem occurs >> >>> >> >>> https://yadi.sk/d/kiqsMF9L3HPvte >> >>> >> >>> debug_osd = 20/20 >> >>> debug_filestore = 20/20 >> >>> debug_tp = 20/20 >> >>> >> >>> >> >>> >> >>> Ubuntu 14.04 >> >>> $ uname -a >> >>> Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 >> >>> 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >> >>> >> >>> Ceph 10.2.7 >> >>> >> >>> 7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd >> >>> intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index >> >>> One of this storages differs only in number of osd, it has 26 osd on >> >>> 4tb, instead of 28 on others >> >>> >> >>> Storages connect to each other by bonded 2x10gbit Clients connect to >> >>> storages by bonded 2x1gbit >> >>> >> >>> in 5 storages 2 x CPU E5-2650v2 and 256 gb RAM in 2 storages 2 x >> >>> CPU >> >>> E5-2690v3 and 512 gb RAM >> >>> >> >>> 7 mons >> >>> 3 rgw >> >>> >> >>> Help me please to rescue the cluster. >> >>> >> >>> >> >> -- >> >> Dmitriev Anton >> >> >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > -- >> > Dmitriev Anton >> >> >> -- >> Dmitriev Anton >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Dmitriev Anton > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com