Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-12 Thread Jason Dillaman
When the exclusive-lock feature is used, any and all Ceph users used for RBD purposes should be double-checked to ensure that they have permission to blacklist clients. This would effect both librbd and krbd, but only after a non-clean shutdown where the image is left in a locked state by a dead cl

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-12 Thread Blair Bethwaite
You're the OP, so for that, thanks! Our upgrade plan (for Thursday this week) was modified today to include prep work to double-check the caps. On 12 September 2017 at 21:26, Nico Schottelius wrote: > > Well, we basically needed to fix it, that's why did it :-) > > > Blair Bethwaite writes: > >>

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-12 Thread Nico Schottelius
Well, we basically needed to fix it, that's why did it :-) Blair Bethwaite writes: > Great to see this issue sorted. > > I have to say I am quite surprised anyone would implement the > export/import workaround mentioned here without *first* racing to this > ML or IRC and crying out for help. T

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Ashley Merrick
[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change] Since you have already upgraded to Luminous, the fastest and probably easiest way to fix this is to run "ceph auth caps client.libvirt mon 'profile rbd' osd 'profile rbd pool=one'" [1]. Luminous pro

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Blair Bethwaite
Great to see this issue sorted. I have to say I am quite surprised anyone would implement the export/import workaround mentioned here without *first* racing to this ML or IRC and crying out for help. This is a valuable resource, made more so by people sharing issues. Cheers, On 12 September 2017

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
For opennebula this would be http://docs.opennebula.org/5.4/deployment/open_cloud_storage_setup/ceph_ds.html (added opennebula in CC) Jason Dillaman writes: > Yes -- the upgrade documentation definitely needs to be updated to add > a pre-monitor upgrade step to verify your caps before proceed

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Jason Dillaman
Yes -- the upgrade documentation definitely needs to be updated to add a pre-monitor upgrade step to verify your caps before proceeding -- I will take care of that under this ticket [1]. I believe the OpenStack documentation has been updated [2], but let me know if you find other places. [1] http:

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
That indeed worked! Thanks a lot! The remaining question from my side: did we do anything wrong in the upgrade process and if not, should it be documented somewhere how to setup the permissions correctly on upgrade? Or should the documentation on the side of the cloud infrastructure software be

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Jason Dillaman
Since you have already upgraded to Luminous, the fastest and probably easiest way to fix this is to run "ceph auth caps client.libvirt mon 'profile rbd' osd 'profile rbd pool=one'" [1]. Luminous provides simplified RBD caps via named profiles which ensure all the correct permissions are enabled. [

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
Hey Jason, here it is: [22:42:12] server4:~# ceph auth get client.libvirt exported keyring for client.libvirt [client.libvirt] key = ... caps mgr = "allow r" caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=one" [22:52:

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
The only errors message I see is from dmesg when trying to accessing the XFS filesystem (see attached image). Let me know if you need any more logs - luckily I can spin up this VM in a broken state as often as you want to :-) Jason Dillaman writes: > ... also, do have any logs from the OS ass

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Jason Dillaman
I see the following which is most likely the issue: 2017-09-11 22:26:38.945776 7efd677fe700 -1 librbd::managed_lock::BreakRequest: 0x7efd58020e70 handle_blacklist: failed to blacklist lock owner: (13) Permission denied 2017-09-11 22:26:38.945795 7efd677fe700 10 librbd::managed_lock::BreakRequest:

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
Thanks a lot for the great ceph.conf pointer, Mykola! I found something interesting: 2017-09-11 22:26:23.418796 7efd7d479700 10 client.1039597.objecter ms_dispatch 0x55b55ab8f950 osd_op_reply(4 rbd_header.df7343d1b58ba [call] v0'0 uv0 ondisk = -8 ((8) Exec format error)) v8 2017-09-11 22:26:2

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Jason Dillaman
... also, do have any logs from the OS associated w/ this log file? I am specifically looking for anything to indicate which sector was considered corrupt. On Mon, Sep 11, 2017 at 4:41 PM, Jason Dillaman wrote: > Thanks -- I'll take a look to see if anything else stands out. That > "Exec format e

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Jason Dillaman
Thanks -- I'll take a look to see if anything else stands out. That "Exec format error" isn't actually an issue -- but now that I know about it, we can prevent it from happening in the future [1] [1] http://tracker.ceph.com/issues/21360 On Mon, Sep 11, 2017 at 4:32 PM, Nico Schottelius wrote: >

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Sarunas Burdulis
On 2017-09-11 09:31, Nico Schottelius wrote: > > Sarunas, > > may I ask when this happened? I was following http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken I can't tell which step in particular the issue with VMs was triggered by. > And did you move OSDs or mons a

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
Hey Mykola, thanks for the hint, I will test this in a few hours when I'm back on a regular Internet connection! Best, Nico Mykola Golub writes: > On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote: >> >> Just tried and there is not much more log in ceph -w (see below) neither

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Jason Dillaman
Definitely would love to see some debug-level logs (debug rbd = 20 and debug objecter = 20) for any VM that experiences this issue. The only thing I can think of is something to do with sparse object handling since (1) krbd doesn't perform sparse reads and (2) re-importing the file would eliminate

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
Sarunas, may I ask when this happened? And did you move OSDs or mons after that export/import procecdure? I really wonder, what is the reason for this behaviour and also if it is likely to experience it again. Best, Nico Sarunas Burdulis writes: > On 2017-09-10 08:23, Nico Schottelius wrot

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Sarunas Burdulis
On 2017-09-10 08:23, Nico Schottelius wrote: > > Good morning, > > yesterday we had an unpleasant surprise that I would like to discuss: > > Many (not all!) of our VMs were suddenly > dying (qemu process exiting) and when trying to restart them, inside the > qemu process we saw i/o errors on the

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Mykola Golub
On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote: > > Just tried and there is not much more log in ceph -w (see below) neither > from the qemu process. > > [15:52:43] server4:~$ /usr/bin/qemu-system-x86_64 -name one-17031 -S > -machine pc-i440fx-2.1,accel=kvm,usb=off -m 8192 -rea

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
egards, > Lionel > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Nico Schottelius >> Sent: dimanche 10 septembre 2017 14:23 >> To: ceph-users >> Cc: kamila.souck...@ungleich.ch >>

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Beard Lionel (BOSTON-STORAGE)
ph-users > Cc: kamila.souck...@ungleich.ch > Subject: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd > change] > > > Good morning, > > yesterday we had an unpleasant surprise that I would like to discuss: > > Many (not all!) of our VMs were suddenly dying (qemu

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius
Just tried and there is not much more log in ceph -w (see below) neither from the qemu process. [15:52:43] server4:~$ /usr/bin/qemu-system-x86_64 -name one-17031 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -uuid 79845fca-9b26-4072-bc

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Jason Dillaman
Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings. On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius wrote: > > Hello Jason, > > I think there is a slight misunderstanding: > There is only one *VM*, not one OSD left that we did not start. > > Or does librbd also read ceph.c

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius
Hello Jason, I think there is a slight misunderstanding: There is only one *VM*, not one OSD left that we did not start. Or does librbd also read ceph.conf and will that cause qemu to output debug messages? Best, Nico Jason Dillaman writes: > I presume QEMU is using librbd instead of a mapp

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Jason Dillaman
I presume QEMU is using librbd instead of a mapped krbd block device, correct? If that is the case, can you add "debug-rbd=20" and "debug objecter=20" to your ceph.conf and boot up your last remaining broken OSD? On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius wrote: > > Good morning, > > yeste

[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius
Good morning, yesterday we had an unpleasant surprise that I would like to discuss: Many (not all!) of our VMs were suddenly dying (qemu process exiting) and when trying to restart them, inside the qemu process we saw i/o errors on the disks and the OS was not able to start (i.e. stopped in init