Re: [ceph-users] Hangs with qemu/libvirt/rbd when one host disappears

2017-12-06 Thread Alwin Antreich
Hello Marcus, On Tue, Dec 05, 2017 at 07:09:35PM +0100, Marcus Priesch wrote: > Dear Ceph Users, > > first of all, big thanks to all the devs and people who made all this > possible, ceph is amazing !!! > > ok, so let me get to the point where i need your help: > > i have a cluster of 6 hosts, mixe

[ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jan Pekař - Imatic
Hi, I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd client timeouted, and device become unavailable. block nbd0: Connection timed out block nbd0: shutting down sockets block nbd0: Connection timed out print_req_error: I/O error, dev nbd0, sector 2131833856 print_req

[ceph-users] I cannot make the OSD to work, Journal always breaks 100% time

2017-12-06 Thread Gonzalo Aguilar Delgado
Hi, Another OSD falled down. And it's pretty scary how easy is to break the cluster. This time is something related to the journal. /usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup ceph starting osd.6 at :/0 osd_data /var/lib/ceph/osd/ceph-6 /var/lib/ceph/osd/ceph-6/journ

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Caspar Smit
2017-12-05 18:39 GMT+01:00 Richard Hesketh : > On 05/12/17 17:10, Graham Allan wrote: > > On 12/05/2017 07:20 AM, Wido den Hollander wrote: > >> Hi, > >> > >> I haven't tried this before but I expect it to work, but I wanted to > >> check before proceeding. > >> > >> I have a Ceph cluster which is

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Wido den Hollander
> Op 5 december 2017 om 18:39 schreef Richard Hesketh > : > > > On 05/12/17 17:10, Graham Allan wrote: > > On 12/05/2017 07:20 AM, Wido den Hollander wrote: > >> Hi, > >> > >> I haven't tried this before but I expect it to work, but I wanted to > >> check before proceeding. > >> > >> I have a C

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Wido den Hollander
> Op 6 december 2017 om 10:17 schreef Caspar Smit : > > > 2017-12-05 18:39 GMT+01:00 Richard Hesketh : > > > On 05/12/17 17:10, Graham Allan wrote: > > > On 12/05/2017 07:20 AM, Wido den Hollander wrote: > > >> Hi, > > >> > > >> I haven't tried this before but I expect it to work, but I wanted

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Yehuda Sadeh-Weinraub
Are you using rgw? There are certain compatibility issues that you might hit if you run mixed versions. Yehuda On Tue, Dec 5, 2017 at 3:20 PM, Wido den Hollander wrote: > Hi, > > I haven't tried this before but I expect it to work, but I wanted to check > before proceeding. > > I have a Ceph cl

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Richard Hesketh
On 06/12/17 09:17, Caspar Smit wrote: > > 2017-12-05 18:39 GMT+01:00 Richard Hesketh >: > > On 05/12/17 17:10, Graham Allan wrote: > > On 12/05/2017 07:20 AM, Wido den Hollander wrote: > >> Hi, > >> > >> I haven't tried this before but I e

Re: [ceph-users] I cannot make the OSD to work, Journal always breaks 100% time

2017-12-06 Thread Ronny Aasen
On 06. des. 2017 10:01, Gonzalo Aguilar Delgado wrote: Hi, Another OSD falled down. And it's pretty scary how easy is to break the cluster. This time is something related to the journal. /usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup ceph starting osd.6 at :/0 osd_data

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Wido den Hollander
> Op 6 december 2017 om 10:25 schreef Yehuda Sadeh-Weinraub : > > > Are you using rgw? There are certain compatibility issues that you > might hit if you run mixed versions. > Yes, it is. So would it hurt if OSDs are running Luminous but the RGW is still Jewel? Multisite isn't used, it's jus

Re: [ceph-users] ceph.conf tuning ... please comment

2017-12-06 Thread Piotr Dałek
On 17-12-06 07:01 AM, Stefan Kooman wrote: [osd] # http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ osd crush update on start = false osd heartbeat interval = 1 # default 6 osd mon heartbeat interval = 10# default 30 osd mon report interval min = 1

Re: [ceph-users] ceph.conf tuning ... please comment

2017-12-06 Thread Van Leeuwen, Robert
Hi, Lets start with a disclaimer: Not an expert on any of these ceph tuning settings :) However, in general with cluster intervals/timings: You are trading quick failovers detection for: 1) Processing power: You might starve yourself of resources when expanding the cluster. If you multiply all

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Yehuda Sadeh-Weinraub
It's hard to say, we don't really test your specific scenario so use it with your own risk. There was a change in cls_refcount that we had issues with in the upgrade suite, but looking at it I'm not sure it'll actually be a problem for you (you'll still hit the original problem though). Other probl

Re: [ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jason Dillaman
On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic wrote: > Hi, > I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd > client timeouted, and device become unavailable. > > block nbd0: Connection timed out > block nbd0: shutting down sockets > block nbd0: Connection timed

Re: [ceph-users] I cannot make the OSD to work, Journal always breaks 100% time

2017-12-06 Thread David Turner
Why are you flushing the journal after you zero it instead of before? That does nothing. You want to flush the journal while it has objects that might not be on the osd before you zero it. On Wed, Dec 6, 2017, 6:02 AM Ronny Aasen wrote: > On 06. des. 2017 10:01, Gonzalo Aguilar Delgado wrote: >

Re: [ceph-users] Any way to get around selinux-policy-base dependency

2017-12-06 Thread Ken Dreyer
Hi Bryan, Why not upgrade to RHEL 7.4? We don't really build Ceph to run on older RHEL releases. - Ken On Mon, Dec 4, 2017 at 11:26 AM, Bryan Banister wrote: > Hi all, > > > > I would like to upgrade to the latest Luminous release but found that it > requires the absolute latest selinux-policy-

Re: [ceph-users] Any way to get around selinux-policy-base dependency

2017-12-06 Thread Bryan Banister
Thanks Ken, that's understandable, -Bryan -Original Message- From: Ken Dreyer [mailto:kdre...@redhat.com] Sent: Wednesday, December 06, 2017 12:03 PM To: Bryan Banister Cc: Ceph Users ; Rafael Suarez Subject: Re: [ceph-users] Any way to get around selinux-policy-base dependency Note: E

[ceph-users] Sudden omap growth on some OSDs

2017-12-06 Thread george.vasilakakos
Hi ceph-users, We have a Ceph cluster (running Kraken) that is exhibiting some odd behaviour. A couple weeks ago, the LevelDBs on some our OSDs started growing large (now at around 20G size). The one thing they have in common is the 11 disks with inflating LevelDBs are all in the set for one PG

Re: [ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread Jan Pekař - Imatic
Hi, On 6.12.2017 15:24, Jason Dillaman wrote: On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic wrote: Hi, I run to overloaded cluster (deep-scrub running) for few seconds and rbd-nbd client timeouted, and device become unavailable. block nbd0: Connection timed out block nbd0: shutting down

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-06 Thread David Turner
I have no proof or anything other than a hunch, but OSDs don't trim omaps unless all PGs are healthy. If this PG is actually not healthy, but the cluster doesn't realize it while these 11 involved OSDs do realize that the PG is unhealthy... You would see this exact problem. The OSDs think a PG is

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-06 Thread Gregory Farnum
On Wed, Dec 6, 2017 at 2:35 PM David Turner wrote: > I have no proof or anything other than a hunch, but OSDs don't trim omaps > unless all PGs are healthy. If this PG is actually not healthy, but the > cluster doesn't realize it while these 11 involved OSDs do realize that the > PG is unhealthy

Re: [ceph-users] rbd-nbd timeout and crash

2017-12-06 Thread David Turner
Do you have the FS mounted with a trimming ability? What are your mount options? On Wed, Dec 6, 2017 at 5:30 PM Jan Pekař - Imatic wrote: > Hi, > > On 6.12.2017 15:24, Jason Dillaman wrote: > > On Wed, Dec 6, 2017 at 3:46 AM, Jan Pekař - Imatic > wrote: > >> Hi, > >> I run to overloaded cluste

Re: [ceph-users] Any way to get around selinux-policy-base dependency

2017-12-06 Thread Brad Hubbard
On Thu, Dec 7, 2017 at 4:23 AM, Bryan Banister wrote: > Thanks Ken, that's understandable, > -Bryan > > -Original Message- > From: Ken Dreyer [mailto:kdre...@redhat.com] > Sent: Wednesday, December 06, 2017 12:03 PM > To: Bryan Banister > Cc: Ceph Users ; Rafael Suarez > > Subject: Re:

[ceph-users] HEALTH_ERR : PG_DEGRADED_FULL

2017-12-06 Thread Karun Josy
Hello, I am seeing health error in our production cluster. health: HEALTH_ERR 1105420/11038158 objects misplaced (10.015%) Degraded data redundancy: 2046/11038158 objects degraded (0.019%), 102 pgs unclean, 2 pgs degraded Degraded data redundancy (low space):

[ceph-users] ceph luminous + multi mds: slow request. behind on trimming, failedto authpin local pins

2017-12-06 Thread Burkhard Linke
Hi, we have upgraded our cluster to luminous 12.2.2 and wanted to use a second MDS for HA purposes. Upgrade itself went well, setting up the second MDS from the former standby-replay configuration worked, too. But upon load both MDS got stuck and need to be restarted. It starts with slow r

Re: [ceph-users] PG::peek_map_epoch assertion fail

2017-12-06 Thread Gonzalo Aguilar Delgado
Hi, Since my email server falled down because the error. I have to reply this way. I added more logs:   int r = store->omap_get_values(coll, pgmeta_oid, keys, &values);   if (r == 0) {     assert(values.size() == 2); -- 0> 2017-12-03 13:39:29.497091 7f467ba0b8c0 -1 osd/PG.cc: I