Hi Bryan and Dan,
Had some similar observations, and wanted some data points from you as well
if possible.
1. When you say down osd , is it down and in or down and out.
2. I see the osd accumulating map ranges, when continuous recovery is going
on, due to osd flaps( that is some portion of the tot
Hi Robert and Arash,
A couple of pointers and asks that might help.
1. Can you point to the code you are using for listing the buckets.
2. Which release is the cluster running on..?
3. What is the number of shards that have been configured in the bucket
index for the said mentioned bucket...?
4.Ha
Please file a tracker with the symptom and examples. Please attach your
OSDMap (ceph osd getmap > osdmap.bin).
Note that https://github.com/ceph/ceph/pull/31956 has the Nautilus
version of improved upmap code. It also changes osdmaptool to match the
mgr behavior, so that one can observe th
Hello,
Anybody seeing the Prometheus endpoint hanging with the new 13.2.7 release?
With 13.2.6 the endpoint would respond with a payload of 15MB in less than
10 seconds.
Now, if you restart ceph-mgr, the Prometheus endpoint responds quickly for
the first run, then successive runs get slower and s
I’ve had one or two situations where swap might have helped a memory
consumption problem, but others in which it would have *worsened* cluster
performance. Sometimes it’s better for the *cluster* for an OSD to die /
restart / get OOMkilled than for it to limp along sluggishly.
In the past RAM
There should be no issue and we have a lot of systems with multiple IPs.
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: A
On Nov 18, 2019, at 8:12 AM, Dan van der Ster wrote:
>
> On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis wrote:
>>
>> On 19/11/14 11:04AM, Gregory Farnum wrote:
>>> On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster
>>> wrote:
Hi Joao,
I might have found the reason why s
Hi all,
Does anyone know what possible issues can arise if the ceph mgr daemon is
running on a mon node that has 2 ips in the public net range (1 is a
loopback address).
As I understand the it. mgr will bind to all ips
FYI - I am not sure why the loopback is there, I am trying to find out.
thx
On Mon, Dec 9, 2019 at 7:47 AM Arash Shams wrote:
> Dear All,
>
> I have almost 30 million objects and I want to list them and index them
> somewhere else,
> Im using boto3 with continuation Marker but it takes almost 9 hours
>
> can I run it in multiple threads to make it faster? what solution d
Dear All,
I have almost 30 million objects and I want to list them and index them
somewhere else,
Im using boto3 with continuation Marker but it takes almost 9 hours
can I run it in multiple threads to make it faster? what solution do you
suggest to speedup this process,
Thanks
_
Hi Jacek,
thanks! I wanted to do follow exactly that plan with the metadata!
For moving the index objects to the proper pool - did you lock the buckets
somehow?I wanted to avoid moving the indeces in the first step, when the
placement targets allow leaving everything as it is.
We have a lot
According to ceph -s the cluster is in recovery, backfill, ect.
data:
pools: 7 pools, 19656 pgs
objects: 65.02M objects, 248 TiB
usage: 761 TiB used, 580 TiB / 1.3 PiB avail
pgs: 16.173% pgs unknown
0.493% pgs not active
890328/195069177 objects
An OSD that is down does not recover or backfill. Faster recovery or
backfill will not resolve down OSDs
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Dec 9
Hi,
I think I can speed-up the recovery / backfill.
What is the recommended setting for
osd_max_backfills
osd_recovery_max_active
?
THX
Am 09.12.2019 um 13:36 schrieb Paul Emmerich:
> This message is expected.
>
> But your current situation is a great example of why having a separate
> cluster
Hi!
>>> Thank you!
>>> The output of both commands are below.
>>> I still dont understand why there are 21T used data (because 5.5T*3 =
>>> 16.5T != 21T) and why there seems to be only 4.5 T MAX AVAIL, but the
>>> osd output tells we have 25T free space.
>>
>> As I know MAX AVAIL is calculated wi
This message is expected.
But your current situation is a great example of why having a separate
cluster network is a bad idea in most situations.
First thing I'd do in this scenario is to get rid of the cluster network
and see if that helps
Paul
--
Paul Emmerich
Looking for help with your Ce
Ingo
Yes, we had to use multicore machine to do this efficiently ;-)
Update procedure is very similar to commands described here:
https://docs.ceph.com/docs/master/radosgw/layout/
so *radosgw-admin metadata get bucket.instance::*
then fix JSON
then* radoshw-admin metadata set bucket.instance::*
Hi,
I had a failure on 2 of 7 OSD nodes.
This caused a server reboot and unfortunately the cluster network failed
to come up.
This resulted in many OSD down situation.
I decided to stop all services (OSD, MGR, MON) and to start them
sequentially.
Now I have multiple OSD marked as down although t
Hello,
Thanks for the response!
So to support the use-case where I have the option to place data on one
side and another replicated on both sides
I would need to have one zonegroup per side and then another zonegroup
that spanned both sites?
Best regards
On 12/6/19 6:30 PM, Casey Bodley wrot
19 matches
Mail list logo