Re: [ceph-users] All OSD fails after few requests to RGW

David Turner Thu, 11 May 2017 10:11:53 -0700

I honestly haven't investigated the command line structure that it would
need, but that looks about what I'd expect.


On Thu, May 11, 2017, 7:58 AM Anton Dmitriev <t...@enumnet.ru> wrote:

> I`m on Jewel 10.2.7
> Do you mean this:
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${osd_num}
> --journal-path /var/lib/ceph/osd/ceph-${osd_num}/journal
> --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
> apply-layout-settings --pool default.rgw.buckets.data --debug
>
> ?
> And before running it I need to stop OSD and flush its journal
>
>
> On 11.05.2017 14:52, David Turner wrote:
>
> If you are on the current release of Ceph Hammer 0.94.10 or Jewel 10.2.7,
> you have it already. I don't remember which release it came out in, but
> it's definitely in the current releases..
>
> On Thu, May 11, 2017, 12:24 AM Anton Dmitriev <t...@enumnet.ru> wrote:
>
>> "recent enough version of the ceph-objectstore-tool" - sounds very
>> interesting. Would it be released in one of next Jewel minor releases?
>>
>>
>> On 10.05.2017 19:03, David Turner wrote:
>>
>> PG subfolder splitting is the primary reason people are going to be
>> deploying Luminous and Bluestore much faster than any other major release
>> of Ceph.  Bluestore removes the concept of subfolders in PGs.
>>
>> I have had clusters that reached what seemed a hardcoded maximum of
>> 12,800 objects in a subfolder.  It would take an osd_heartbeat_grace of 240
>> or 300 to let them finish splitting their subfolders without being marked
>> down.  Recently I came across a cluster that had a setting of 240 objects
>> per subfolder before splitting, so it was splitting all the time, and
>> several of the OSDs took longer than 30 seconds to finish splitting into
>> subfolders.  That led to more problems as we started adding backfilling to
>> everything and we lost a significant amount of throughput on the cluster.
>>
>> I have yet to manage a cluster with a recent enough version of the
>> ceph-objectstore-tool (hopefully I'll have one this month) that includes
>> the ability to take an osd offline, split the subfolders, then bring it
>> back online.  If you set up a way to monitor how big your subfolders are
>> getting, you can leave the ceph settings as high as you want, and then go
>> in and perform maintenance on your cluster 1 failure domain at a time
>> splitting all of the PG subfolders on the OSDs.  This approach would remove
>> this ever happening in the wild.
>>
>> On Wed, May 10, 2017 at 5:37 AM Piotr Nowosielski <
>> piotr.nowosiel...@allegrogroup.com> wrote:
>>
>>> It is difficult for me to clearly state why some PGs have not been
>>> migrated.
>>> crushmap settings? Weight of OSD?
>>>
>>> One thing is certain - you will not find any information about the split
>>> process in the logs ...
>>>
>>> pn
>>>
>>> -----Original Message-----
>>> From: Anton Dmitriev [mailto:t...@enumnet.ru]
>>> Sent: Wednesday, May 10, 2017 10:14 AM
>>> To: Piotr Nowosielski <piotr.nowosiel...@allegrogroup.com>;
>>> ceph-users@lists.ceph.com
>>> Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>>>
>>> When I created cluster, I made a mistake in configuration, and set split
>>> parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder.
>>> After that I changed split to 8, and increased number of pg and pgp from
>>> 2048 to 4096 for pool, where problem occurs. While it was backfilling I
>>> observed, that placement groups were backfilling from one set of 3 OSD to
>>> another set of 3 OSD (replicated size = 3), so I made a conclusion, that
>>> PGs
>>> are completely recreating while increasing PG and PGP for pool and after
>>> this process number of files per directory must be Ok. But when
>>> backfilling
>>> finished I found many directories in this pool with ~20
>>> 000 files. Why Increasing PG num did not helped? Or maybe after this
>>> process
>>> some files will be deleted with some delay?
>>>
>>> I couldn`t find any information about directory split process in logs,
>>> also
>>> with osd and filestore debug 20. What pattern and in what log I need to
>>> grep
>>> for finding it?
>>>
>>> On 10.05.2017 10:36, Piotr Nowosielski wrote:
>>> > You can:
>>> > - change these parameters and use ceph-objectstore-tool
>>> > - add OSD host - rebuild the cluster will reduce the number of files
>>> > in the directories
>>> > - wait until "split" operations are over ;-)
>>> >
>>> > In our case, we could afford to wait until the "split" operation is
>>> > over (we have 2 clusters in slightly different configurations storing
>>> > the same data)
>>> >
>>> > hint:
>>> > When creating a new pool, use the parameter "expected_num_objects"
>>> > https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_
>>> > pools_operate.html
>>> >
>>> > Piotr Nowosielski
>>> > Senior Systems Engineer
>>> > Zespół Infrastruktury 5
>>> > Grupa Allegro sp. z o.o.
>>> > Tel: +48 512 08 55 92
>>> >
>>> >
>>> > -----Original Message-----
>>> > From: Anton Dmitriev [mailto:t...@enumnet.ru]
>>> > Sent: Wednesday, May 10, 2017 9:19 AM
>>> > To: Piotr Nowosielski <piotr.nowosiel...@allegrogroup.com>;
>>> > ceph-users@lists.ceph.com
>>> > Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>>> >
>>> > How did you solved it? Set new split/merge thresholds, and manually
>>> > applied it by ceph-objectstore-tool --data-path
>>> > /var/lib/ceph/osd/ceph-${osd_num} --journal-path
>>> > /var/lib/ceph/osd/ceph-${osd_num}/journal
>>> > --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
>>> > apply-layout-settings --pool default.rgw.buckets.data
>>> >
>>> > on each OSD?
>>> >
>>> > How I can see in logs, that split occurs?
>>> >
>>> > On 10.05.2017 10:13, Piotr Nowosielski wrote:
>>> >> Hey,
>>> >> We had similar problems. Look for information on "Filestore merge and
>>> >> split".
>>> >>
>>> >> Some explain:
>>> >> The OSD, after reaching a certain number of files in the directory
>>> >> (it depends of 'filestore merge threshold' and 'filestore split
>>> multiple'
>>> >> parameters) rebuilds the structure of this directory.
>>> >> If the files arrives, the OSD creates new subdirectories and moves
>>> >> some of the files there.
>>> >> If the files are missing the OSD will reduce the number of
>>> >> subdirectories.
>>> >>
>>> >>
>>> >> --
>>> >> Piotr Nowosielski
>>> >> Senior Systems Engineer
>>> >> Zespół Infrastruktury 5
>>> >> Grupa Allegro sp. z o.o.
>>> >> Tel: +48 512 08 55 92
>>> >>
>>> >> Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy
>>> ul.
>>> >> Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego
>>> >> przez Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII
>>> >> Gospodarczy Krajowego Rejestru Sądowego pod numerem KRS 0000268796, o
>>> >> kapitale zakładowym w wysokości 33 976 500,00 zł, posiadająca numer
>>> >> identyfikacji podatkowej NIP: 5272525995.
>>> >>
>>> >>
>>> >>
>>> >> -----Original Message-----
>>> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>>> >> Of Anton Dmitriev
>>> >> Sent: Wednesday, May 10, 2017 8:14 AM
>>> >> To: ceph-users@lists.ceph.com
>>> >> Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>>> >>
>>> >> Hi!
>>> >>
>>> >> I increased pg_num and pgp_num for pool default.rgw.buckets.data from
>>> >> 2048 to 4096, and it seems that situation became a bit better,
>>> >> cluster dies after 20-30 PUTs, not after 1. Could someone please give
>>> >> me some recommendations how to rescue the cluster?
>>> >>
>>> >> On 27.04.2017 09:59, Anton Dmitriev wrote:
>>> >>> Cluster was going well for a long time, but on the previous week
>>> >>> osds start to fail.
>>> >>> We use cluster like image storage for Opennebula with small load and
>>> >>> like object storage with high load.
>>> >>> Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz
>>> >>> over 1000, while reading or writing a few kilobytes in a second,
>>> >>> osds on this disks become unresponsive and cluster marks them down.
>>> >>> We lower the load to object storage and situation became better.
>>> >>>
>>> >>> Yesterday situation became worse:
>>> >>> If RGWs are disabled and there is no requests to object storage
>>> >>> cluster performing well, but if enable RGWs and make a few PUTs or
>>> >>> GETs all not SSD osds on all storages become in the same situation,
>>> >>> described above.
>>> >>> IOtop shows, that xfsaild/<disk> burns disks.
>>> >>>
>>> >>> trace-cmd record -e xfs\*  for a 10 seconds shows 10 milion objects,
>>> >>> as i understand it means ~360 000 objects to push per one osd for a
>>> >>> 10 seconds
>>> >>>      $ wc -l t.t
>>> >>> 10256873 t.t
>>> >>>
>>> >>> fragmentation on one of such disks is about 3%
>>> >>>
>>> >>> more information about cluster:
>>> >>>
>>> >>> https://yadi.sk/d/Y63mXQhl3HPvwt
>>> >>>
>>> >>> also debug logs for osd.33 while problem occurs
>>> >>>
>>> >>> https://yadi.sk/d/kiqsMF9L3HPvte
>>> >>>
>>> >>> debug_osd = 20/20
>>> >>> debug_filestore = 20/20
>>> >>> debug_tp = 20/20
>>> >>>
>>> >>>
>>> >>>
>>> >>> Ubuntu 14.04
>>> >>> $ uname -a
>>> >>> Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29
>>> >>> 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>> >>>
>>> >>> Ceph 10.2.7
>>> >>>
>>> >>> 7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd
>>> >>> intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index
>>> >>> One of this storages differs only in number of osd, it has 26 osd on
>>> >>> 4tb, instead of 28 on others
>>> >>>
>>> >>> Storages connect to each other by bonded 2x10gbit Clients connect to
>>> >>> storages by bonded 2x1gbit
>>> >>>
>>> >>> in 5 storages 2 x CPU E5-2650v2  and 256 gb RAM in 2 storages 2 x
>>> >>> CPU
>>> >>> E5-2690v3  and 512 gb RAM
>>> >>>
>>> >>> 7 mons
>>> >>> 3 rgw
>>> >>>
>>> >>> Help me please to rescue the cluster.
>>> >>>
>>> >>>
>>> >> --
>>> >> Dmitriev Anton
>>> >>
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> > --
>>> > Dmitriev Anton
>>>
>>>
>>> --
>>> Dmitriev Anton
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> Dmitriev Anton
>>
>>
>
> --
> Dmitriev Anton
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All OSD fails after few requests to RGW

Reply via email to