[ceph-users] Re: osdmaps not trimmed until ceph-mon's

2019-12-09 Thread Romit Misra
Hi Bryan and Dan, Had some similar observations, and wanted some data points from you as well if possible. 1. When you say down osd , is it down and in or down and out. 2. I see the osd accumulating map ranges, when continuous recovery is going on, due to osd flaps( that is some portion of the tot

[ceph-users] Re: : RGW listing millions of objects takes too much time

2019-12-09 Thread Romit Misra
Hi Robert and Arash, A couple of pointers and asks that might help. 1. Can you point to the code you are using for listing the buckets. 2. Which release is the cluster running on..? 3. What is the number of shards that have been configured in the bucket index for the said mentioned bucket...? 4.Ha

[ceph-users] RESEND: Re: PG Balancer Upmap mode not working

2019-12-09 Thread David Zafman
Please file a tracker with the symptom and examples.  Please attach your OSDMap (ceph osd getmap > osdmap.bin). Note that https://github.com/ceph/ceph/pull/31956 has the Nautilus version of improved upmap code.  It also changes osdmaptool to match the mgr behavior, so that one can observe th

[ceph-users] Prometheus endpoint hanging with 13.2.7 release?

2019-12-09 Thread Paul Choi
Hello, Anybody seeing the Prometheus endpoint hanging with the new 13.2.7 release? With 13.2.6 the endpoint would respond with a payload of 15MB in less than 10 seconds. Now, if you restart ceph-mgr, the Prometheus endpoint responds quickly for the first run, then successive runs get slower and s

[ceph-users] Re: High swap usage on one replication node

2019-12-09 Thread Anthony D'Atri
I’ve had one or two situations where swap might have helped a memory consumption problem, but others in which it would have *worsened* cluster performance. Sometimes it’s better for the *cluster* for an OSD to die / restart / get OOMkilled than for it to limp along sluggishly. In the past RAM

[ceph-users] Re: ceph mgr daemon multiple ip addresses

2019-12-09 Thread Martin Verges
There should be no issue and we have a lot of systems with multiple IPs. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: A

[ceph-users] Re: osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

2019-12-09 Thread Bryan Stillwell
On Nov 18, 2019, at 8:12 AM, Dan van der Ster wrote: > > On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis wrote: >> >> On 19/11/14 11:04AM, Gregory Farnum wrote: >>> On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster >>> wrote: Hi Joao, I might have found the reason why s

[ceph-users] ceph mgr daemon multiple ip addresses

2019-12-09 Thread Frank R
Hi all, Does anyone know what possible issues can arise if the ceph mgr daemon is running on a mon node that has 2 ips in the public net range (1 is a loopback address). As I understand the it. mgr will bind to all ips FYI - I am not sure why the loopback is there, I am trying to find out. thx

[ceph-users] Re: RGW listing millions of objects takes too much time

2019-12-09 Thread Robert LeBlanc
On Mon, Dec 9, 2019 at 7:47 AM Arash Shams wrote: > Dear All, > > I have almost 30 million objects and I want to list them and index them > somewhere else, > Im using boto3 with continuation Marker but it takes almost 9 hours > > can I run it in multiple threads to make it faster? what solution d

[ceph-users] RGW listing millions of objects takes too much time

2019-12-09 Thread Arash Shams
Dear All, I have almost 30 million objects and I want to list them and index them somewhere else, Im using boto3 with continuation Marker but it takes almost 9 hours can I run it in multiple threads to make it faster? what solution do you suggest to speedup this process, Thanks _

[ceph-users] Re: nautilus radosgw fails with pre jewel buckets - index objects not at right place

2019-12-09 Thread Ingo Reimann
Hi Jacek, thanks! I wanted to do follow exactly that plan with the metadata! For moving the index objects to the proper pool - did you lock the buckets somehow?I wanted to avoid moving the indeces in the first step, when the placement targets allow leaving everything as it is. We have a lot

[ceph-users] Re: OSD state: transitioning to Stray

2019-12-09 Thread Thomas Schneider
According to ceph -s the cluster is in recovery, backfill, ect.   data:     pools:   7 pools, 19656 pgs     objects: 65.02M objects, 248 TiB     usage:   761 TiB used, 580 TiB / 1.3 PiB avail     pgs: 16.173% pgs unknown 0.493% pgs not active 890328/195069177 objects

[ceph-users] Re: OSD state: transitioning to Stray

2019-12-09 Thread Paul Emmerich
An OSD that is down does not recover or backfill. Faster recovery or backfill will not resolve down OSDs Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Dec 9

[ceph-users] Re: OSD state: transitioning to Stray

2019-12-09 Thread Thomas Schneider
Hi, I think I can speed-up the recovery / backfill. What is the recommended setting for osd_max_backfills osd_recovery_max_active ? THX Am 09.12.2019 um 13:36 schrieb Paul Emmerich: > This message is expected. > > But your current situation is a great example of why having a separate > cluster

[ceph-users] Re: Size and capacity calculations questions

2019-12-09 Thread Jochen Schulz
Hi! >>> Thank you! >>> The output of both commands are below. >>> I still dont understand why there are 21T used data (because 5.5T*3 = >>> 16.5T != 21T) and why there seems to be only 4.5 T MAX AVAIL, but the >>> osd output tells we have 25T free space. >> >> As I know MAX AVAIL is calculated wi

[ceph-users] Re: OSD state: transitioning to Stray

2019-12-09 Thread Paul Emmerich
This message is expected. But your current situation is a great example of why having a separate cluster network is a bad idea in most situations. First thing I'd do in this scenario is to get rid of the cluster network and see if that helps Paul -- Paul Emmerich Looking for help with your Ce

[ceph-users] Re: nautilus radosgw fails with pre jewel buckets - index objects not at right place

2019-12-09 Thread Jacek Suchenia
Ingo Yes, we had to use multicore machine to do this efficiently ;-) Update procedure is very similar to commands described here: https://docs.ceph.com/docs/master/radosgw/layout/ so *radosgw-admin metadata get bucket.instance::* then fix JSON then* radoshw-admin metadata set bucket.instance::*

[ceph-users] OSD state: transitioning to Stray

2019-12-09 Thread Thomas Schneider
Hi, I had a failure on 2 of 7 OSD nodes. This caused a server reboot and unfortunately the cluster network failed to come up. This resulted in many OSD down situation. I decided to stop all services (OSD, MGR, MON) and to start them sequentially. Now I have multiple OSD marked as down although t

[ceph-users] Re: Multi-site RadosGW with multiple placement targets

2019-12-09 Thread Tobias Urdin
Hello, Thanks for the response! So to support the use-case where I have the option to place data on one side and another replicated on both sides I would need to have one zonegroup per side and then another zonegroup that spanned both sites? Best regards On 12/6/19 6:30 PM, Casey Bodley wrot