When the exclusive-lock feature is used, any and all Ceph users used
for RBD purposes should be double-checked to ensure that they have
permission to blacklist clients. This would effect both librbd and
krbd, but only after a non-clean shutdown where the image is left in a
locked state by a dead cl
You're the OP, so for that, thanks! Our upgrade plan (for Thursday
this week) was modified today to include prep work to double-check the
caps.
On 12 September 2017 at 21:26, Nico Schottelius
wrote:
>
> Well, we basically needed to fix it, that's why did it :-)
>
>
> Blair Bethwaite writes:
>
>>
Well, we basically needed to fix it, that's why did it :-)
Blair Bethwaite writes:
> Great to see this issue sorted.
>
> I have to say I am quite surprised anyone would implement the
> export/import workaround mentioned here without *first* racing to this
> ML or IRC and crying out for help. T
[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]
Since you have already upgraded to Luminous, the fastest and probably
easiest way to fix this is to run "ceph auth caps client.libvirt mon
'profile rbd' osd 'profile rbd pool=one'" [1]. Luminous pro
Great to see this issue sorted.
I have to say I am quite surprised anyone would implement the
export/import workaround mentioned here without *first* racing to this
ML or IRC and crying out for help. This is a valuable resource, made
more so by people sharing issues.
Cheers,
On 12 September 2017
For opennebula this would be
http://docs.opennebula.org/5.4/deployment/open_cloud_storage_setup/ceph_ds.html
(added opennebula in CC)
Jason Dillaman writes:
> Yes -- the upgrade documentation definitely needs to be updated to add
> a pre-monitor upgrade step to verify your caps before proceed
Yes -- the upgrade documentation definitely needs to be updated to add
a pre-monitor upgrade step to verify your caps before proceeding -- I
will take care of that under this ticket [1]. I believe the OpenStack
documentation has been updated [2], but let me know if you find other
places.
[1] http:
That indeed worked! Thanks a lot!
The remaining question from my side: did we do anything wrong in the
upgrade process and if not, should it be documented somewhere how to
setup the permissions correctly on upgrade?
Or should the documentation on the side of the cloud infrastructure
software be
Since you have already upgraded to Luminous, the fastest and probably
easiest way to fix this is to run "ceph auth caps client.libvirt mon
'profile rbd' osd 'profile rbd pool=one'" [1]. Luminous provides
simplified RBD caps via named profiles which ensure all the correct
permissions are enabled.
[
Hey Jason,
here it is:
[22:42:12] server4:~# ceph auth get client.libvirt
exported keyring for client.libvirt
[client.libvirt]
key = ...
caps mgr = "allow r"
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx
pool=one"
[22:52:
The only errors message I see is from dmesg when trying to accessing the
XFS filesystem (see attached image).
Let me know if you need any more logs - luckily I can spin up this VM in
a broken state as often as you want to :-)
Jason Dillaman writes:
> ... also, do have any logs from the OS ass
I see the following which is most likely the issue:
2017-09-11 22:26:38.945776 7efd677fe700 -1
librbd::managed_lock::BreakRequest: 0x7efd58020e70 handle_blacklist:
failed to blacklist lock owner: (13) Permission denied
2017-09-11 22:26:38.945795 7efd677fe700 10
librbd::managed_lock::BreakRequest:
Thanks a lot for the great ceph.conf pointer, Mykola!
I found something interesting:
2017-09-11 22:26:23.418796 7efd7d479700 10 client.1039597.objecter ms_dispatch
0x55b55ab8f950 osd_op_reply(4 rbd_header.df7343d1b58ba [call] v0'0 uv0 ondisk =
-8 ((8) Exec format error)) v8
2017-09-11 22:26:2
... also, do have any logs from the OS associated w/ this log file? I
am specifically looking for anything to indicate which sector was
considered corrupt.
On Mon, Sep 11, 2017 at 4:41 PM, Jason Dillaman wrote:
> Thanks -- I'll take a look to see if anything else stands out. That
> "Exec format e
Thanks -- I'll take a look to see if anything else stands out. That
"Exec format error" isn't actually an issue -- but now that I know
about it, we can prevent it from happening in the future [1]
[1] http://tracker.ceph.com/issues/21360
On Mon, Sep 11, 2017 at 4:32 PM, Nico Schottelius
wrote:
>
On 2017-09-11 09:31, Nico Schottelius wrote:
>
> Sarunas,
>
> may I ask when this happened?
I was following
http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken
I can't tell which step in particular the issue with VMs was triggered by.
> And did you move OSDs or mons a
Hey Mykola,
thanks for the hint, I will test this in a few hours when I'm back on a
regular Internet connection!
Best,
Nico
Mykola Golub writes:
> On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote:
>>
>> Just tried and there is not much more log in ceph -w (see below) neither
Definitely would love to see some debug-level logs (debug rbd = 20 and
debug objecter = 20) for any VM that experiences this issue. The only
thing I can think of is something to do with sparse object handling
since (1) krbd doesn't perform sparse reads and (2) re-importing the
file would eliminate
Sarunas,
may I ask when this happened?
And did you move OSDs or mons after that export/import procecdure?
I really wonder, what is the reason for this behaviour and also if it is
likely to experience it again.
Best,
Nico
Sarunas Burdulis writes:
> On 2017-09-10 08:23, Nico Schottelius wrot
On 2017-09-10 08:23, Nico Schottelius wrote:
>
> Good morning,
>
> yesterday we had an unpleasant surprise that I would like to discuss:
>
> Many (not all!) of our VMs were suddenly
> dying (qemu process exiting) and when trying to restart them, inside the
> qemu process we saw i/o errors on the
On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote:
>
> Just tried and there is not much more log in ceph -w (see below) neither
> from the qemu process.
>
> [15:52:43] server4:~$ /usr/bin/qemu-system-x86_64 -name one-17031 -S
> -machine pc-i440fx-2.1,accel=kvm,usb=off -m 8192 -rea
egards,
> Lionel
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Nico Schottelius
>> Sent: dimanche 10 septembre 2017 14:23
>> To: ceph-users
>> Cc: kamila.souck...@ungleich.ch
>>
ph-users
> Cc: kamila.souck...@ungleich.ch
> Subject: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd
> change]
>
>
> Good morning,
>
> yesterday we had an unpleasant surprise that I would like to discuss:
>
> Many (not all!) of our VMs were suddenly dying (qemu
Just tried and there is not much more log in ceph -w (see below) neither
from the qemu process.
[15:52:43] server4:~$ /usr/bin/qemu-system-x86_64 -name one-17031 -S
-machine pc-i440fx-2.1,accel=kvm,usb=off -m 8192 -realtime mlock=off
-smp 6,sockets=6,cores=1,threads=1 -uuid
79845fca-9b26-4072-bc
Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings.
On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius
wrote:
>
> Hello Jason,
>
> I think there is a slight misunderstanding:
> There is only one *VM*, not one OSD left that we did not start.
>
> Or does librbd also read ceph.c
Hello Jason,
I think there is a slight misunderstanding:
There is only one *VM*, not one OSD left that we did not start.
Or does librbd also read ceph.conf and will that cause qemu to output
debug messages?
Best,
Nico
Jason Dillaman writes:
> I presume QEMU is using librbd instead of a mapp
I presume QEMU is using librbd instead of a mapped krbd block device,
correct? If that is the case, can you add "debug-rbd=20" and "debug
objecter=20" to your ceph.conf and boot up your last remaining broken
OSD?
On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
wrote:
>
> Good morning,
>
> yeste
Good morning,
yesterday we had an unpleasant surprise that I would like to discuss:
Many (not all!) of our VMs were suddenly
dying (qemu process exiting) and when trying to restart them, inside the
qemu process we saw i/o errors on the disks and the OS was not able to
start (i.e. stopped in init
28 matches
Mail list logo