[ceph-users] RGW Beast frontend and ipv6 options

2019-04-26 Thread Abhishek Lekshmanan
Currently RGW's beast frontend supports ipv6 via the endpoint configurable. The port option will bind to ipv4 _only_. http://docs.ceph.com/docs/master/radosgw/frontends/#options Since many Linux systems may default the sysconfig net.ipv6.bindv6only flag to true, it usually means that specifying

[ceph-users] Nautilus - The Manager Daemon spams its logfile with level 0 messages

2019-04-26 Thread Markus Baier
I updated the test cluster from luminous to nautilus and now the ceph manager daemon starts to spam it's logfile with log level 0 messages. There is a new entry every two seconds: 2019-04-26 12:27:18.889 7f8af1afe700 0 log_channel(cluster) log [DBG] : pgmap v15: 128 pgs: 128 active+clean; 2.1 K

[ceph-users] Luminous 12.2.8, active+undersized+degraded+inconsistent

2019-04-26 Thread Slava Astashonok
Hello, I am running Ceph cluster on Luminous 12.2.8 with 36 OSD. Today deep-scrub has found error on PG 25.60 and later fail one of OSD. Now PG 25.60 stuck in active+undersized+degraded+inconsistent state. I cant repair it by ceph pg repair 25.60 – the repair process does not start at all. What i

Re: [ceph-users] clock skew

2019-04-26 Thread mj
Hi all, Thanks for all replies! @Huang: ceph time-sync-status is exactly what I was looking for, thanks! @Janne: i will checkout/implement the peer config per your suggestion. However what confuses us is that chrony thinks the clocks match, and only ceph feels it doesn't. So we are not sure i

[ceph-users] Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode

2019-04-26 Thread Erik Lindahl
Hi list, In conjunction with taking a new storage server online we observed that a whole bunch of the SSD OSDs we use for metadata went offline, and crash every time they try to restart with an abort signal in OSDMap::decode - brief log below. We have seen this at least once in the past, and I su

[ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Jan Pekař - Imatic
Hi, yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole cluster was not responding and IO stopped. I also notice, that problem was with cephx - all OSDs were reporting the same (even the same number of sec

[ceph-users] Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode

2019-04-26 Thread Erik Lindahl
Hi list, In conjunction with taking a new storage server online we observed that a whole bunch of the SSD OSDs we use for metadata went offline, and crash every time they try to restart with an abort signal in OSDMap::decode - brief log below: 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/

[ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Hi, I upgraded to Nautilus a week or two ago and things had been mostly fine. I was interested in trying the device health stats feature and enabled it. In doing so it created a pool, device_health_metrics, which contained zero bytes. Unfortunately this pool developed a PG that could not be repai

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Gregory Farnum
On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic wrote: > > Hi, > > yesterday my cluster reported slow request for minutes and after restarting > OSDs (reporting slow requests) it stuck with peering PGs. Whole > cluster was not responding and IO stopped. > > I also notice, that problem was wit

Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Gregory Farnum
You'll probably want to generate a log with "debug osd = 20" and "debug bluestore = 20", then share that or upload it with ceph-post-file, to get more useful info about which PGs are breaking (is it actually the ones that were supposed to delete?). If there's a particular set of PGs you need to re

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Brian Topping
> On Apr 26, 2019, at 1:50 PM, Gregory Farnum wrote: > > Hmm yeah, it's probably not using UTC. (Despite it being good > practice, it's actually not an easy default to adhere to.) cephx > requires synchronized clocks and probably the same timezone (though I > can't swear to that.) Apps don’t “se

Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Thanks for the suggestions. I've uploaded the surprisingly large (1.5G!) log file: ceph-post-file: 2d8d22f4-580b-4b57-a13a-f49dade34ba7 Looks like these are the relevant lines: -52> 2019-04-26 19:23:05.190 7fb2657dc700 20 osd.2 op_wq(2) _process empty q, waiting -51> 2019-04-26 19:23:05.190

Re: [ceph-users] clock skew

2019-04-26 Thread Anthony D'Atri
> @Janne: i will checkout/implement the peer config per your suggestion. > However what confuses us is that chrony thinks the clocks match, and > only ceph feels it doesn't. So we are not sure if the peer config will > actually help in this situation. But time will tell. Ar ar. Chrony thinks t

Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Using ceph-objectstore-info on PG 25.0 (which indeed, was the one I remember having the error) shows this: struct_v 10 { "pgid": "25.0", "last_update": "7592'106", "last_complete": "7592'106", "log_tail": "0'0", "last_user_version": 106, "last_backfill": "MIN", "last_ba

[ceph-users] How to enable TRIM on dmcrypt bluestore ssd devices

2019-04-26 Thread Kári Bertilsson
Hello I am using "ceph-deploy osd create --dmcrypt --bluestore" to create the OSD's. I know there is some security concern when enabling TRIM/discard on encrypted devices, but i would rather get the performance increase. Wondering how to enable TRIM in this scenario ? ___

Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Thanks for the pointer to ceph-objectstore-tool, it turns out that removing and exporting the PG from all three disks was enough to make it boot! I've exported the three copies of the bad PG, let me know if you'd like me to upload them anywhere for inspection. All data has been recovered (since I