Hi,
The ceph.log from when you upgraded should give some clues.
Are you using upmap balancing? Maybe this is just just further
refinement of the balancing.
-- dan
On Thu, Mar 5, 2020 at 8:58 AM Rainer Krienke wrote:
>
> Hello,
>
> at the moment my ceph is still working but in a degraded state a
Hi,
I have stopped all 3 MON services sequentially.
After starting the 3 MON services again, the slow ops where gone.
However, just after 1 min. of MON service uptime, the slow ops are back
again, and the blocked time is increasing constantly.
root@ld3955:/home/ceph-scripts
# ceph -w
cluster:
Den tors 5 mars 2020 kl 08:13 skrev Stefan Priebe - Profihost AG <
s.pri...@profihost.ag>:
> >> Hrm. We have checksums on the actual OSD data, so it ought to be
> >> possible to add these to the export/import/diff bits so it can be
> >> verified faster.
> >> (Well, barring bugs.)
> >>
> > I mainly
I found some information in ceph.log that might help to find out what
happened. node2 was the one I rebooted:
2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
scrub starts
2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
scrub ok
2020-03-05 07:24:38.94840
Did you have `144 total, 144 up, 144 in` also before the upgrade?
If an osd was out, then you upgraded/restarted and it went back in, it
would trigger data movement.
(I usually set noin before an upgrade).
-- dan
On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke wrote:
>
> I found some information i
I also had some inadvertent recovery going on, although I think it
started after I had restarted all MON, MGR, and MDS nodes and before I
started restarting OSDs.
On 05/03/2020 09:49, Dan van der Ster wrote:
Did you have `144 total, 144 up, 144 in` also before the upgrade?
If an osd was out,
Hello,
before I ran the update to 14.2.8 I checked that the state was healthy
with all OSDs up and in. I still have the command history I typed
visible in my kde terminal buffer and there I see that after the update
but before the reboot I ran a ceph -s and there were 144 osd's up and in
the state
Hello,
I have a small ceph cluster running with 3 MON/MGR and 3 OSD hosts.
There are also 3 virtual hosts in the crushmap to have a seperate SSD
pool. Currently two pools are running, one of that exclusive to the SSD
device class.
My problem now is, that any new pool I try to create won't b
Hi,
There was movement already before you rebooted the node at 07:24:41.598004.
That tells me that it was a ceph-mon process that restarted and either
trimmed some upmaps or something similar.
You can do this to see exactly what changed:
# ceph osd getmap -o 31853 31853 # this is a guess -- pi
The difference was not a big one and consists in a change in pgp_num for
a pool named pxa-ec froom 1024 to 999. All OSDs were up in the last map
(31856) :
# diff 31853.txt 31856.txt
1c1
< epoch 31853
---
> epoch 31856
4c4
< modified 2020-03-04 14:41:52.079327
---
> modified 2020-03-05 07:24:39.938
Hi all,
There's something broken in our env when we try to add new mons to
existing clusters, confirmed on two clusters running mimic and
nautilus. It's basically this issue
https://tracker.ceph.com/issues/42830
In case something is wrong with our puppet manifests, I'm trying to
doing it manually
Ahh that's it! You have `autoscale_mode on` for the pool, and in
14.2.8 there was a fix to calculating how many PGs are needed in an
erasure coded pool:
https://github.com/ceph/ceph/commit/0253205ef36acc6759a3a9687c5eb1b27aa901bf
So at the moment your PGs are merging.
If you want to stop that ch
Ok this seems to makes sense.
At the moment the cluster is still busy hnadling misplaced objects, but
when its done, I will set autoscale to "warn"
and also set the no...-Flags and then try to upgrade the next monitor
and see if this works smoother.
Thank you very much for yout help. I learned a
In data mercoledì 4 marzo 2020 18:14:31 CET, Chad William Seys ha scritto:
> > Maybe I've marked the object as "lost" and removed the failed
> > OSD.
> >
> > The cluster now is healthy, but I'd like to understand if it's likely
> > to bother me again in the future.
>
> Yeah, I don't know.
>
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi all,
>
> There's something broken in our env when we try to add new mons to
> existing clusters, confirmed on two clusters running mimic and
> nautilus. It's basically this issue
> https://tracker.ceph.com/issues/42830
>
> In case something is wron
Hi,
I'm (still) testing upgrading from Luminous to Nautilus and ran into the
following situation:
The lab-setup I'm testing in has three OSD-Hosts.
If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my
Mon-Nodes starts to rapidly grow in size until either the OSD-host comes
ba
On 3/5/20 3:22 PM, Sage Weil wrote:
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
>> Hi all,
>>
>> There's something broken in our env when we try to add new mons to
>> existing clusters, confirmed on two clusters running mimic and
>> nautilus. It's basically this issue
>> https://tracker.ceph.c
On 3/3/20 2:33 PM, Scheurer François wrote:
/(resending to the new maillist)/
Dear Casey, Dear All,
We tested the migration from Luminous to Nautilus and noticed two
regressions breaking the RGW integration in Openstack:
1) the following config parameter is not working on Nautilu
Dear Casey
Many thanks that's great to get your help!
Cheers
Francois
From: Casey Bodley
Sent: Thursday, March 5, 2020 3:57 PM
To: Scheurer François; ceph-users@ceph.io
Cc: Engelmann Florian; Rafael Weingärtner
Subject: Re: Fw: Incompatibilities (impl
Hi Sage,
On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > Hi all,
> >
> > There's something broken in our env when we try to add new mons to
> > existing clusters, confirmed on two clusters running mimic and
> > nautilus. It's basically this issu
On Thu, Mar 5, 2020 at 3:31 PM Wido den Hollander wrote:
>
>
>
> On 3/5/20 3:22 PM, Sage Weil wrote:
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> >> Hi all,
> >>
> >> There's something broken in our env when we try to add new mons to
> >> existing clusters, confirmed on two clusters running m
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi Sage,
>
> On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > Hi all,
> > >
> > > There's something broken in our env when we try to add new mons to
> > > existing clusters, confirmed on two clu
On 2020-03-05 04:22, Anthony D'Atri wrote:
>>> The message HEALTH_ERR, in red, on the front of the dashboard, is an
>>> interesting way to start the day. ;)
>>
>> If possible, I'd suggest to change this into a HEALTH_WARN state -
>> heaven is not falling down just because the telemetry module can'
Hello,
i am running luminous and i would like to back up my cluster from
Site-A to Site-B (one way)
So i decided to mirror it to an off-site ceph cluster.
I read: https://docs.ceph.com/docs/luminous/rbd/rbd-mirroring/
But i liked I https://github.com/MiracleMa/Blog/issues/2 a little better.
Bu
No, I don't have cache tiering enabled. I also found strange that the PG
was marked unfound: the cluster was perfectly healthy before the kernel
panic and a single OSD failure shouldn't create mush hassle.
Yes, it is a bug unless using a singly replicated pool!
C.
__
On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > Hi Sage,
> >
> > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > Hi all,
> > > >
> > > > There's something broken in our env when we
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > Hi Sage,
> > >
> > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
> > > >
> > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > Hi all
On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > Hi Sage,
> > > >
> > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
> > > > >
>
On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote:
>
> On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote:
> > > >
> > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > Hi Sage,
> >
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote:
> >
> > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote:
> > > > >
> > > > > On Thu,
On Thu, Mar 5, 2020 at 8:19 PM Sage Weil wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote:
> > >
> > > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote:
> > > >
> > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > On Thu, Mar
Hi,
Does someone know if the following harddisk has a decent performance in
a ceph cluster:
Micron 5210 ION 1.92TB, SATA (MTFDDAK1T9QDE-2AV1ZABYY)
The spec state, that the disk has power loss protection, however, I'd
nevertheless like to make sure that all goes well with this disk.
Best Regards,
That depends on how you define “decent” , and your use case.
Be careful that these are QLC drives. QLC is pretty new and longevity would
seem to vary quite a bit based on op mix. These might be fine for read-mostly
workloads, but high-turnover databases might burn them up fast, especially as
Simone;
What is your failure domain?
If you don't know your failure domain can you provide the CRUSH ruleset for the
pool that experienced the "object unfound" error?
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
>
>>> Sage, do you think I can workaround by setting
>>> mon_sync_max_payload_size ridiculously small, like 1024 or something
>>> like that?
>>
>> Yeah... IIRC that is how the original user worked around the problem. I
>> think they use 64 or 128 KB.
>
> Nice... 64kB still triggered elections
I have just ordered two of them to try. (the 3.47GB ION's)
If you want, next week I could perhaps run some commands on them..?
MJ
On 3/5/20 9:38 PM, Hermann Himmelbauer wrote:
Hi,
Does someone know if the following harddisk has a decent performance in
a ceph cluster:
Micron 5210 ION 1.92TB, S
36 matches
Mail list logo