[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-21 Thread Christian Rohmann
Hello Eugen, On 20/12/2021 22:02, Eugen Block wrote: you wrote that this cluster was initially installed with Octopus, so no upgrade ceph wise? Are all RGW daemons on the exact same ceph (minor) versions? I remember one of our customers reporting inconsistent objects on a regular basis althoug

[ceph-users] Re: airgap install

2021-12-21 Thread Marc
I have also an 'airgapped install' but with rpm's, simply cloning the necessary repositories. Why go through all these efforts trying to get this to work via containers? > > Kai, thank you for your answer. It looks like the "ceph config set > > mgr..." commands are the key part, to specify my

[ceph-users] Re: airgap install

2021-12-21 Thread Kai Stian Olstad
On 21.12.2021 09:41, Marc wrote: I have also an 'airgapped install' but with rpm's, simply cloning the necessary repositories. Why go through all these efforts trying to get this to work via containers? For me that is completely new to Ceph started with the documentation[1] and the recommended

[ceph-users] Docker 1.13.1 on CentOS 7 too old for Ceph Pacific

2021-12-21 Thread Robert Sander
Hi, trying to "ceph orch upgrade" an Octopus cluster to Pacific we ran into an issue that was solved by updating Docker to the latest from the Docker repository. The error is: /bin/docker: stderr /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235:

[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-21 Thread Christian Rohmann
Thanks for your response Stefan, On 21/12/2021 10:07, Stefan Schueffler wrote: Even without adding a lot of rgw objects (only a few PUTs per minute), we have thousands and thousands of rgw bucket.sync log entries in the rgw log pool (this seems to be a separate problem), and as such we accumul

[ceph-users] 3 OSDs can not be started after a server reboot - rocksdb Corruption

2021-12-21 Thread Sebastian Mazza
Hi all, after a reboot of a cluster 3 OSDs can not be started. The OSDs exit with the following error message: 2021-12-21T01:01:02.209+0100 7fd368cebf00 4 rocksdb: [db_impl/db_impl.cc:396] Shutdown: canceling all background work 2021-12-21T01:01:02.209+0100 7fd368cebf00 4 rock

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2021-12-21 Thread ceph
Hi, This > fsck failed: (5) Input/output error Sounds like an Hardware issue. Did you have a Look on "dmesg"? Hth Mehmet Am 21. Dezember 2021 17:47:35 MEZ schrieb Sebastian Mazza : >Hi all, > >after a reboot of a cluster 3 OSDs can not be started. The OSDs exit with the >following error messa

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2021-12-21 Thread Igor Fedotov
Hi Sebastian, first of all I'm not sure this issue has the same root cause as Francois one. Highly likely it's just another BlueFS/RocksDB data corruption which is indicated in the same way. In this respect I would rather mention this one reported just yesterday: https://lists.ceph.io/hyperk

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2021-12-21 Thread Sebastian Mazza
Hi Mehmet, thank you for your suggestion. I did now check the kernel log, but I didn’t see something interesting. However, I copied the parts that seams to be related to the SATA disks of the failed OSDs. Maybe you see more than I do. [1.815801] ata7: SATA link down (SStatus 0 SControl 300)

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2021-12-21 Thread Sebastian Mazza
Hi Igor, I now fixed my wrong OSD debug config to: [osd.7] debug bluefs = 20 debug bdev = 20 and you can download the debug log from: https://we.tl/t-3e4do1PQGj Thanks, Sebastian > On 21.12.2021, at 19:44, Igor Fedotov wrote: > > Hi Sebastian, > > first of all I'm not su

[ceph-users] Re: Large latency for single thread

2021-12-21 Thread norman.kern
Marc, Thanks for your reply. The wiki page is very helpful to me. I have analyzed the I/O flow and pretend to optimize the librbd client. And I found RBD has support persistent cache(https://docs.ceph.com/en/pacific/rbd/rbd-persistent-write-back-cache/), and I will have a try. P.S. Anyone

[ceph-users] Re: Large latency for single thread

2021-12-21 Thread norman.kern
Mark, Thanks for your reply. I made the test on the local host and no replica pg set.  The crimson may help me a lot and I will do more tests. And I will try rbd persistent cache feature for that the client is sensitive to latency. P.S. crimson can be used in production now or not ? On 12/

[ceph-users] Re: Large latency for single thread

2021-12-21 Thread Mark Nelson
Hi Norman, Persistent client side cache potentially may help in this case if you are ok with the trade-offs.  It's been a while since I've seen any benchmarks with it so you may need to do some testing yourself. Crimson is not ready for production use at this time so I would focus on the e

[ceph-users] Re: RBD bug #50787

2021-12-21 Thread Konstantin Shalygin
Hi, This is librbd issue, not the cluster You can upgrade your Nautilus clients to last Pacific to achieve this fix k Sent from my iPhone > On 21 Dec 2021, at 18:28, J-P Methot wrote: > > Hi, > > This is regarding this bug: https://tracker.ceph.com/issues/50787 . From what > I understand,