er health for later fixing.
Best regads,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: 18 September 2020 15:38:51
To: Michael Thomas; ceph-users@ceph.io
Subject: [ceph-users] Re: multiple OSD crash, unfound objects
Dear Micha
uble-shooting guide? I suspect that the
> removal has left something in an inconsistent state that requires manual
> clean up for recovery to proceed.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> _
eleted snapshots in one of the copies. I used
> ceph-objectstoretool to remove the "wrong" part. Did you check you OSD
> logs? Do the osd go down wirth an obscure stacktrace (and maybe they are
> restartet by systemd ...)
>
> rgds,
>
> j.
>
>
>
> On
t the incomplete PG resolved with the above, but it will
move some issues out of the way before proceeding.
Best regards,
=========
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Michael Thomas
Sent: 14 October 2020 20:52:10
To: Andreas
trative, like peering attempts.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: 16 October 2020 15:09:20
To: Michael Thomas; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: multiple OSD cras
lly see why the
missing OSDs are not assigned to the two PGs 1.0 and 7.39d.
Best regards,
=====
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: 16 October 2020 15:41:29
To: Michael Thomas; ceph-users@ceph.io
Subject: [ceph-users] Re: multiple O
On 10/20/20 1:18 PM, Frank Schilder wrote:
Dear Michael,
Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD
mapping?
I meant here with crush rule replicated_host_nvme. Sorry, forgot.
Seems to have worked fine:
https://pastebin.com/PFgDE4J1
Yes, the OSD was st
w
defunct) has been blacklisted. I'll check back later to see if the slow
OPS get cleared from 'ceph status'.
Regards,
--Mike
________
From: Michael Thomas
Sent: 20 October 2020 23:48:36
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re:
I find time today to look at the incomplete PG.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________
From: Michael Thomas
Sent: 21 October 2020 22:58:47
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re
rds,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: 22 October 2020 09:32:07
To: Michael Thomas; ceph-users@ceph.io
Subject: [ceph-users] Re: multiple OSD crash, unfound objects
Sounds good. Did you re-create the pool again?
I'm setting up a radosgw for my ceph Octopus cluster. As soon as I
started the radosgw service, I notice that it created a handful of new
pools. These pools were assigned the 'replicated_data' crush rule
automatically.
I have a mixed hdd/ssd/nvme cluster, and this 'replicated_data' crush
ru
dhils...@performair.com
www.PerformAir.com
-Original Message-
From: Michael Thomas [mailto:w...@caltech.edu]
Sent: Tuesday, November 10, 2020 1:32 PM
To: ceph-users@ceph.io
Subject: [ceph-users] safest way to re-crush a pool
I'm setting up a radosgw for my ceph Octopus cluster. As soon as
On 10/23/20 3:07 AM, Frank Schilder wrote:
Hi Michael.
I still don't see any traffic to the pool, though I'm also unsure how much
traffic is to be expected.
Probably not much. If ceph df shows that the pool contains some objects, I
guess that's sorted.
That osdmaptool crashes indicates tha
one and the broken PG(s) might get deleted cleanly. Then you still
have a surplus pool, but at least all PGs are clean.
I hope one of these will work. Please post your experience here.
Best regards,
=====
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
I am gathering prometheus metrics from my (unhealthy) Octopus (15.2.4)
cluster and notice a discrepency (or misunderstanding) with the ceph
dashboard.
In the dashboard, and with ceph -s, it reports 807 million objects objects:
pgs: 169747/807333195 objects degraded (0.021%)
On 12/3/20 6:47 PM, Satoru Takeuchi wrote:
Hi,
Could you tell me whether it's ok to remove device_health_metrics pool
after disabling device monitoring feature?
I don't use device monitoring feature because I capture hardware
information from other way.
However, after disabling this feature, de
rank Schilder wrote:
Dear Michael,
yes, your plan will work if the temporary space requirement can be addressed.
Good luck!
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Michael Thomas
Sent: 22 November 2
I have a cephfs secondary (non-root) data pool with unfound and degraded
objects that I have not been able to recover[1]. I created an
additional data pool and used "setfattr -n ceph.dir.layout.pool' and a
very long rsync to move the files off of the degraded pool and onto the
new pool. This
1 shard per object and ordinary
recovery could fix it.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Michael Thomas
Sent: 21 December 2020 23:12:09
To: ceph-users@ceph.io
Subject: [ceph-users] Removing secon
gh to find out where such an object count comes
from. However, ceph df is known to be imperfect. Maybe its just an accounting
bug there. I think there were a couple of cases where people deleted all
objects in a pool and ceph df would still report non-zero usage.
Best regards,
=
F
Hi Joshua,
I have had a similar issue three different times on one of my cephfs
pools (15.2.10). The first time this happened I had lost some OSDs. In
all cases I ended up with degraded PGs with unfound objects that could
not be recovered.
Here's how I recovered from the situation. Note th
this up under the assumption that the data is lost?
~Joshua
Joshua West
President
403-456-0072
CAYK.ca
On Thu, Apr 8, 2021 at 6:15 PM Michael Thomas wrote:
Hi Joshua,
I have had a similar issue three different times on one of my cephfs
pools (15.2.10). The first time this happened I had lost
I recently had a similar issue when reducing the number of PGs on a
pool. A few OSDs became backfillful even though there was enough space;
the OSDs were just not balanced well.
To fix, I reweighted the most-full OSDs:
ceph osd reweight-by-utilization 120
After it finished (~1 hour), I had f
Is there a way to log or track which cephfs files are being accessed?
This would help us in planning where to place certain datasets based on
popularity, eg on a EC HDD pool or a replicated SSD pool.
I know I can run inotify on the ceph clients, but I was hoping that the
MDS would have a way t
I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph
cluster. I was able to determine that they are all coming from the same
OSD: osd.143. This host recently suffered from an unplanned power loss,
so I'm not surprised that there may be some corruption. This PG is part
of
On 10/3/21 12:08, 胡 玮文 wrote:
在 2021年10月4日,00:53,Michael Thomas 写道:
I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph
cluster. I was able to determine that they are all coming from the same OSD:
osd.143. This host recently suffered from an unplanned power loss, so
On 10/4/21 11:57 AM, Dave Hall wrote:
> I also had a delay on the start of the repair scrub when I was dealing with
> this issue. I ultimately increased the number of simultaneous scrubs, but
> I think you could also temporarily disable scrubs and then re-issue the 'pg
> repair'. (But I'm not one
In 15.2.7, how can I remove an invalid crush class? I'm surprised that
I was able to create it in the first place:
[root@ceph1 bin]# ceph osd crush class ls
[
"ssd",
"JBOD.hdd",
"nvme",
"hdd"
]
[root@ceph1 bin]# ceph osd crush class ls-osd JBOD.hdd
Invalid command: invalid cha
...sorta. I have a ovirt-4.4.2 system installed a couple of years ago
and set up managed block storage using ceph Octopus[1]. This has been
working well since it was originally set up.
In late November we had some network issues on one of our ovirt hosts,
as well a seperate network issue tha
On 1/7/22 16:49, Marc wrote:
Where else can I look to find out why the managed block storage isn't
accessible anymore?
ceph -s ? I guess it is not showing any errors, and there is probably nothing
with ceph, you can do an rbdmap and see if you can just map an image.
Then try mapping an im
Try this:
ceph osd crush reweight osd.XX 0
--Mike
On 5/28/22 15:02, Nico Schottelius wrote:
Good evening dear fellow Ceph'ers,
when removing OSDs from a cluster, we sometimes use
ceph osd reweight osd.XX 0
and wait until the OSD's content has been redistributed. However, when
then fin
On my relatively new Octopus cluster, I have one PG that has been
perpetually stuck in the 'unknown' state. It appears to belong to the
device_health_metrics pool, which was created automatically by the mgr
daemon(?).
The OSDs that the PG maps to are all online and serving other PGs. But
wh
On 8/11/20 2:52 AM, Wido den Hollander wrote:
On 11/08/2020 00:40, Michael Thomas wrote:
On my relatively new Octopus cluster, I have one PG that has been
perpetually stuck in the 'unknown' state. It appears to belong to the
device_health_metrics pool, which was created automatica
On 8/11/20 8:35 AM, Michael Thomas wrote:
On 8/11/20 2:52 AM, Wido den Hollander wrote:
On 11/08/2020 00:40, Michael Thomas wrote:
On my relatively new Octopus cluster, I have one PG that has been
perpetually stuck in the 'unknown' state. It appears to belong to
the device_heal
Over the weekend I had multiple OSD servers in my Octopus cluster
(15.2.4) crash and reboot at nearly the same time. The OSDs are part of
an erasure coded pool. At the time the cluster had been busy with a
long-running (~week) remapping of a large number of PGs after I
incrementally added mor
there is another method, i never got a reply to my question in the
tracker.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Michael Thomas
Sent: 16 September 2020 01:27:19
To: ceph-users@ceph.io
Subject: [ceph-user
weekend so that hopefully
the deep scrubs can catch up and possibly locate any missing objects.
--Mike
Best regards,
=====
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Michael Thomas
Sent: 17 September 2020 22:27:47
To: Frank Schilder;
37 matches
Mail list logo