Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread cephmailinglist
On 03/11/2017 09:36 PM, Christian Theune wrote: Hello, I have had reports that Qemu (librbd connections) will require updates/restarts before upgrading. What was your experience on that side? Did you upgrade the clients? Did you start using any of the new RBD features, like fast diff? We ha

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread cephmailinglist
On 03/11/2017 09:49 PM, Udo Lembke wrote: Hi Udo, Perhaps would an "find /var/lib/ceph/ ! -uid 64045 -exec chown ceph:ceph" do an better job?! We did exactly that (and also tried other combinations) and that is a workaround for the 'argument too long' problem, but then it would call an exec

Re: [ceph-users] pgs stuck inactive

2017-03-12 Thread Laszlo Budai
Hello, I have already done the export with ceph_objectstore_tool. I just have to decide which OSDs to keep. Can you tell me why the directory structure in the OSDs is different for the same PG when checking on different OSDs? For instance, in OSD 2 and 63 there are NO subdirectories in the 3.36

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Brad Hubbard
On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune wrote: > Hi, > > thanks for that report! Glad to hear a mostly happy report. I’m still on the > fence … ;) > > I have had reports that Qemu (librbd connections) will require > updates/restarts before upgrading. What was your experience on that side

Re: [ceph-users] pgs stuck inactive

2017-03-12 Thread Brad Hubbard
On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai wrote: > Hello, > > I have already done the export with ceph_objectstore_tool. I just have to > decide which OSDs to keep. > Can you tell me why the directory structure in the OSDs is different for the > same PG when checking on different OSDs? > For i

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Matyas Koszik
On Sat, 11 Mar 2017, Udo Lembke wrote: > On 11.03.2017 12:21, cephmailingl...@mosibi.nl wrote: > > ... > > > > > > e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0 chown ceph:ceph > > ... the 'find' in step e found so much files that xargs (the shell) > > could not handle it (too many a

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Florian Haas
On Sat, Mar 11, 2017 at 12:21 PM, wrote: > The upgrade of our biggest cluster, nr 4, did not go without > problems. Since we where expecting a lot of "failed to encode map > e with expected crc" messages, we disabled clog to monitors > with 'ceph tell osd.* injectargs -- --clog_to_monitors=false'

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-12 Thread Florian Haas
On Sat, Mar 11, 2017 at 4:24 PM, Laszlo Budai wrote: >>> Can someone explain the meaning of osd_disk_thread_ioprio_priority. I'm >>> [...] >>> >>> Now I am confused :( >>> >>> Can somebody bring some light here? >> >> >> Only to confuse you some more. If you are running Jewel or above then >>

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-12 Thread Laszlo Budai
Hi Florian, thank you for your answer. We have already set the IO scheduler to cfq in order to be able to lower the priority of the scrub operations. My problem is that I've found different values set for the same parameter, and in each case they were doing it in order to achieve the same thin

[ceph-users] speed decrease with size

2017-03-12 Thread Ben Erridge
I am testing attached volume storage on our openstack cluster which uses ceph for block storage. our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm thinking some parameter is a little off because with relatively small writes I am seeing drastically reduced write speeds. we

Re: [ceph-users] speed decrease with size

2017-03-12 Thread Christian Balzer
Hello, On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote: > I am testing attached volume storage on our openstack cluster which uses > ceph for block storage. > our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm > thinking some parameter is a little off because with re

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Christian Balzer
Hello, On Sun, 12 Mar 2017 19:52:12 +1000 Brad Hubbard wrote: > On Sun, Mar 12, 2017 at 6:36 AM, Christian Theune > wrote: > > Hi, > > > > thanks for that report! Glad to hear a mostly happy report. I’m still on the > > fence … ;) > > > > I have had reports that Qemu (librbd connections) will

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-12 Thread Christian Balzer
Hello, On Sun, 12 Mar 2017 19:54:10 +0100 Florian Haas wrote: > On Sat, Mar 11, 2017 at 12:21 PM, wrote: > > The upgrade of our biggest cluster, nr 4, did not go without > > problems. Since we where expecting a lot of "failed to encode map > > e with expected crc" messages, we disabled clog to

Re: [ceph-users] Latest Jewel New OSD Creation

2017-03-12 Thread Ashley Merrick
After rolling back to 10.2.5 the issue has gone, seems there has been a change in 10.2.6 which breaks this. ,Ashley From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ashley Merrick Sent: Saturday, 11 March 2017 11:32 AM To: ceph-us...@ceph.com Subject: [ceph-users] Latest

[ceph-users] noout, nodown and blocked requests

2017-03-12 Thread Shain Miley
Hello, One of the nodes in our 14 node cluster is offline and before I totally commit to fully removing the node from the cluster (there is a chance I can get the node back in working order in the next few days) I would like to run the cluster with that single node out for a few days. Currently

Re: [ceph-users] noout, nodown and blocked requests

2017-03-12 Thread Alexandre DERUMIER
Hi, >>Currently I have the. noout and nodown flags set while doing the maintenance >>work. you only need noout to avoid rebalancing see documentation: http://docs.ceph.com/docs/kraken/rados/troubleshooting/troubleshooting-osd/ "STOPPING W/OUT REBALANCING". Your clients are hanging because of