ceph version: 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus
(stable)
I set up a ceph cluster and I'm uploading objects through rgw with a speed
of 60 objects/s. I added some lifecycle rules to buckets so that my disks
will not be used up.
However, after I set "debug_rgw" to 5 and run
> /var/log/ceph/ceph.log:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 :
> cluster [WRN] Large omap object found. Object:
> 2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count:
> 1048559 Size (bytes): 46407183
> /var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56
46403355
/var/log/ceph/ceph.log:2020-02-27 16:18:00.328869 osd.40 (osd.40) 1585 :
cluster [WRN] Large omap object found. Object:
2:654134d2:::mds0_openfiles.0:head PG: 2.4b2c82a6 (2.26) Key count:
1048559 Size (bytes): 46407183
/var/log/ceph/ceph.log-20200227.gz:2020-02-26 19:56:24.972431 osd
Christian;
What is your failure domain? If your failure domain is set to OSD / drive, and
2 OSDs share a DB / WAL device, and that DB / WAL device dies, then portions of
the data could drop to read-only (or be lost...).
Ceph is really set up to own the storage hardware directly. It doesn't
(
Hi everyone,
we currently have 6 OSDs with 8TB HDDs split across 3 hosts.
The main usage is KVM-Images.
To improve speed we planned on putting the block.db and WAL onto NVMe-SSDs.
The plan was to put 2x1TB in each host.
One option I thought of was to RAID 1 them for better redundancy, I don't
It seems that one of the down PGs was able to recover just fine, but the other
OSD went into "incomplete" state after export-and-removing the affected PG from
the down OSD.
I've still got the exported data from the pg, although re-importing it to the
OSD again causes the crashes.
What's the be
FTR, the root cause is now understood:
https://tracker.ceph.com/issues/39525#note-21
-- dan
On Thu, Feb 20, 2020 at 9:24 PM Dan van der Ster wrote:
>
> On Thu, Feb 20, 2020 at 9:20 PM Wido den Hollander wrote:
> >
> > > Op 20 feb. 2020 om 19:54 heeft Dan van der Ster het
> > > volgende geschr
Hi all,
Could someone make luminous available for buster (not container version,
or nautilus)?
What are the reasons for not having the version available from
eu.ceph.com? What would be the motivation needed to add the packages?
As I can see curl/libcurl4 version is the only thing needed to
Thanks Sage, I can try that. Admittedly I'm not sure how to tell if these two
PG can recover without this particular OSD.
Note, it seems like there is still an underlying related issue, with hit set
archives popping up as unfound objects on my cluster as in Paul's ticket. In
total I had about 1
If the pg in question can recover without that OSD, I would use
use ceph-objectstore-tool to export and remove it, and then move on.
I hit a similar issue on my system (due to a bunch in an early octopus
build) and it was super tedious to fix up manually (needed patched
code and manual modificat
Alternatively, it might be handy to have the passive mgrs issue an HTTP
redirect to the active mgr. Then a single DNS name pointing to all mgrs
would always work, even when the active mgr fails over.
Going a step further with some HA strategies, the cluster could have a
separate, floating IP/
Also: make a backup using the PG export feature of objectstore-tool
before doing anything else.
Sometimes it's enough to export and delete the PG from the broken OSD
and import it into a different OSD using objectstore-tool.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contac
Crash happens in PG::activate, so it's unrelated to IO etc.
My first approach here would be to read the code and try to understand
why it crashes/what the exact condition is that is violated here.
It looks like something that can probably be fixed by fiddling around
with ceph-objectstore-tool (but
Thanks Paul.
I was able to mark many of the unfound ones as lost, but I'm still stuck with
one unfound and OSD assert at this point.
I've tried setting many of the OSD options to pause all cluster I/O,
backfilling, rebalancing, tiering agent, etc to try to avoid hitting the assert
but alas thi
I've also encountered this issue, but luckily without the crashing
OSDs, so marking as lost resolved it for us.
See https://tracker.ceph.com/issues/44286
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
w
Hi Mehmet,
In our case the ceph pg repair fixed the issues (read_error). I think the
read_error was just temporary due to low available RAM.
You might want to check your actual issue with ceph pg query
Kind regards,
Caspar Smit
Systemengineer
SuperNAS
Dorsvlegelstraat 13
1445 PA Purmerend
t:
Hi all,
A similar question would be if it is possible to let passive mgr do the data
collection!?
We run 14.2.6 on a medium 2.5PB cluster with over 900M objects (rbd and mainly
S3) . At the moment, we face an issue with the prometheus exporter while it has
high load. (e.g. while we insert a ne
17 matches
Mail list logo