[ceph-users] Re: airgap install

2021-12-20 Thread Kai Stian Olstad
On 17.12.2021 11:06, Zoran Bošnjak wrote: Kai, thank you for your answer. It looks like the "ceph config set mgr..." commands are the key part, to specify my local registry. However, I haven't got that far with the installation. I have tried various options, but I have problems already with the b

[ceph-users] Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-20 Thread Eugen Block
Hi, you wrote that this cluster was initially installed with Octopus, so no upgrade ceph wise? Are all RGW daemons on the exact same ceph (minor) versions? I remember one of our customers reporting inconsistent objects on a regular basis although no hardware issues were detectable. They r

[ceph-users] Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

2021-12-20 Thread Christian Rohmann
Hello Ceph-Users, for about 3 weeks now I see batches of scrub errors on a 4 node Octopus cluster: # ceph health detail HEALTH_ERR 7 scrub errors; Possible data damage: 6 pgs inconsistent [ERR] OSD_SCRUB_ERRORS: 7 scrub errors [ERR] PG_DAMAGED: Possible data damage: 6 pgs inconsistent     pg

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
On 12/20/21 13:48, Igor Fedotov wrote: Andrej, do you remember about those OSDs crashing other days and failed to start - did they finally expose (similar?) BlueFS/RocksDB issues or that was something completely different? it was different, at least for the crashes bellow. though to me it

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Igor Fedotov
Andrej, do you remember about those OSDs crashing other days and failed to start - did they finally expose (similar?) BlueFS/RocksDB issues or that was something completely different? And generally - do you think your cluster is susceptible to OSD being corrupted on restart/failure? I.e the

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
On 12/20/21 13:14, Igor Fedotov wrote: On 12/20/2021 2:58 PM, Andrej Filipcic wrote: On 12/20/21 12:47, Igor Fedotov wrote: Thanks for the info. Just in case - is write caching disabled for the disk in question? What's the output for "hdparm -W " ? no, it is enabled. Shall I disable that

[ceph-users] Re: 50% IOPS performance drop after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

2021-12-20 Thread Marc
> > Thanks a lot! This is reasonable data, do you plan tp upgrade to Octopus > anytime soon? Would be very interested in the same tests after the > migration > H, not really, the idea is to discover what is going on with your situation, so if I am having this also after upgrading, I know ho

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Igor Fedotov
On 12/20/2021 2:58 PM, Andrej Filipcic wrote: On 12/20/21 12:47, Igor Fedotov wrote: Thanks for the info. Just in case - is write caching disabled for the disk in question? What's the output for "hdparm -W " ? no, it is enabled. Shall I disable that on all OSDs? I can't tell you for sure

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz http://www-f9.ijs.si/~andrej/ceph-osd.611-debug20.log sorry, bad copy/paste Andrej This is exactly the same link as the original one - it doesn't have verbose bluefs logging... Igor Fedotov Ceph Lead Developer Looking for help wi

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Igor Fedotov
.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz http://www-f9.ijs.si/~andrej/ceph-osd.611-debug20.log sorry, bad copy/paste Andrej This is exactly the same link as the original one - it doesn't have verbose bluefs logging... Igor Fedotov Ceph Lead Developer Looking for help with your C

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
On 12/20/21 10:47, Igor Fedotov wrote: On 12/20/2021 12:26 PM, Andrej Filipcic wrote: On 12/20/21 10:09, Igor Fedotov wrote: Hi Andrej, 3) Please set debug-bluefs to 20, retry the OSD start and share the log. http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz http://www-f9

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Igor Fedotov
On 12/20/2021 12:26 PM, Andrej Filipcic wrote: On 12/20/21 10:09, Igor Fedotov wrote: Hi Andrej, 3) Please set debug-bluefs to 20, retry the OSD start and share the log. http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz This is exactly the same link as the original one - it

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
here http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz 3) Please set debug-bluefs to 20, retry the OSD start and share the log. http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz 4) Please share the content of the broken CURRENT file [root@lcst0032 db]# hexdump CU

[ceph-users] Re: Luminous: export and migrate rocksdb to dedicated lvm/unit

2021-12-20 Thread Igor Fedotov
Have you tried --command option instead of using fixed positional syntax: ceph-bluestore-tool --path /dev/osd1/ --devs-source dev/osd1/block --dev-target dev/osd1/block.db --command bluefs-bdev-migrate If so was it showing the same error? Thanks, Igor On 12/19/2021 11:46 AM, Flavio Piccioni

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Igor Fedotov
Hi Andrej, first of all I'd like to mention that this issue is rather not new to 16.2.7. There is a ticket: https://tracker.ceph.com/issues/47330 which has mentions the similar case for mimic. And the ticket erroneously tagged as resolved - but the proposed fix just introduces bluefs file imp

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
Hi, attachment stripped. Here is the log: http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz Andrej On 12/20/21 09:17, Andrej Filipcic wrote: Hi, When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to start. The first 16.2.7 startup crashes here: 2021-12-19T09

[ceph-users] 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
Hi, When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to start. The first 16.2.7 startup crashes here: 2021-12-19T09:52:34.128+0100 7ff7104c0080  1 bluefs mount 2021-12-19T09:52:34.129+0100 7ff7104c0080  1 bluefs _init_alloc shared, id 1, capacity 0xe8d7fc0, block size 0x1