Re: [ceph-users] [Luminous] rgw not deleting object

2017-09-10 Thread Andreas Calminder
Hi,
I had a similar problem on jewel, where I was unable to properly delete
objects eventhough radosgw-admin returned rc 0 after issuing rm, somehow
the object was deleted but the metadata wasn't removed.

I ran
# radosgw-admin --cluster ceph object stat --bucket=weird_bucket
--object=$OBJECT

to figure out if the object was there or not and then used the 'rados put'
command to upload a dummy object and then remove it properly
# rados -c /etc/ceph/ceph.conf -p ceph.rgw.buckets.data put
be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_$OBJECT dummy.file

Hope it helps,
Andreas

On 10 Sep 2017 1:20 a.m., "Jack"  wrote:

> Hi,
>
> I face a wild issue: I cannot remove an object from rgw (via s3 API)
>
> My steps:
> s3cmd ls s3://bucket/object -> it exists
> s3cmd rm s3://bucket/object -> success
> s3cmd ls s3://bucket/object -> it still exists
>
> At this point, I can curl and get the object (thus, it does exists)
>
> Doing the same via boto leads to the same behavior
>
> Log sample:
> 2017-09-10 01:18:42.502486 7fd189e7d700  1 == starting new request
> req=0x7fd189e77300 =
> 2017-09-10 01:18:42.504028 7fd189e7d700  1 == req done
> req=0x7fd189e77300 op status=-2 http_status=204 ==
> 2017-09-10 01:18:42.504076 7fd189e7d700  1 civetweb: 0x560ebc275000:
> 10.42.43.6 - - [10/Sep/2017:01:18:38 +0200] "DELETE /bucket/object
> HTTP/1.1" 1 0 - Boto/2.44.0 Python/3.5.4 Linux/4.12.0-1-amd64
>
> What can I do ?
> What data shall I provide to debug this issue ?
>
> Regards,
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-maintainers] Ceph release cadence

2017-09-10 Thread Yehuda Sadeh-Weinraub
I'm not a huge fan of train releases, as they tend to never quite make
it on time and it always feels a bit artificial timeline anyway. OTOH,
I do see and understand the need of a predictable schedule with a
roadmap attached to it. There are many that need to have at least a
vague idea on what we're going to ship and when, so that they can plan
ahead. We need it ourselves, as sometimes the schedule can dictate the
engineering decision that we're going to make.
On the other hand, I think that working towards a release that come
out after 9 or 12 months is a bit too long. This is a recipe for more
delays as the penalty for missing a feature is painful. Maybe we can
consider returning to shorter iterations for *dev* releases. These are
checkpoints that need to happen after a short period (2-3 weeks),
where we end up with a minimally tested release that passes some smoke
test. Features are incrementally added to the dev release. The idea
behind a short term dev release is that it minimizes the window where
master is completely unusable, thus reduces the time to stabilization.
Then it's easier to enforce a train schedule if we want to. It might
be easier to let go of a feature that doesn't make it, as it will be
there soon, and maybe if really needed we (or the downstream
maintainer) can make the decision to backport it. This makes me think
that we could revisit the our backport policy/procedure/tooling, so
that we can do it in a sane and safe way when needed and possible.

Yehuda

On Fri, Sep 8, 2017 at 7:59 PM, Gregory Farnum  wrote:
> I think I'm the resident train release advocate so I'm sure my
> advocating that model will surprise nobody. I'm not sure I'd go all
> the way to Lars' multi-release maintenance model (although it's
> definitely something I'm interested in), but there are two big reasons
> I wish we were on a train with more frequent real releases:
>
> 1) It reduces the cost of features missing a release. Right now if
> something misses an LTS release, that's it for a year. And nobody
> likes releasing an LTS without a bunch of big new features, so each
> LTS is later than the one before as we scramble to get features merged
> in.
>
> ...and then we deal with the fact that we scrambled to get a bunch of
> features merged in and they weren't quite baked. (Luminous so far
> seems to have gone much better in this regard! Hurray! But I think
> that has a lot to do with our feature-release-scramble this year being
> mostly peripheral stuff around user interfaces that got tacked on
> about the time we'd initially planned the release to occur.)
>
> 2) Train releases increase predictability for downstreams, partners,
> and users around when releases will happen. Right now, the release
> process and schedule is entirely opaque to anybody who's not involved
> in every single upstream meeting we have; and it's unpredictable even
> to those who are. That makes things difficult, as Xiaoxi said.
>
> There are other peripheral but serious benefits I'd expect to see from
> fully-validated train releases as well. It would be *awesome* to have
> more frequent known-stable points to do new development against. If
> you're an external developer and you want a new feature, you have to
> either keep it rebased against a fast-changing master branch, or you
> need to settle for writing it against a long-out-of-date LTS and then
> forward-porting it for merge. If you're an FS developer writing a very
> small new OSD feature and you try to validate it against RADOS, you've
> no idea if bugs that pop up and look random are because you really did
> something wrong or if there's currently an intermittent issue in RADOS
> master. I would have *loved* to be able to maintain CephFS integration
> branches for features that didn't touch RADOS and were built on top of
> the latest release instead of master, but it was utterly infeasible
> because there were too many missing features with the long delays.
>
> On Fri, Sep 8, 2017 at 9:16 AM, Sage Weil  wrote:
>> I'm going to pick on Lars a bit here...
>>
>> On Thu, 7 Sep 2017, Lars Marowsky-Bree wrote:
>>> On 2017-09-06T15:23:34, Sage Weil  wrote:
>>> > Other options we should consider?  Other thoughts?
>>>
>>> With about 20-odd years in software development, I've become a big
>>> believer in schedule-driven releases. If it's feature-based, you never
>>> know when they'll get done.
>>>
>>> If the schedule intervals are too long though, the urge to press too
>>> much in (so as not to miss the next merge window) is just too high,
>>> meaning the train gets derailed. (Which cascades into the future,
>>> because the next time the pressure will be even higher based on the
>>> previous experience.) This requires strictness.
>>>
>>> We've had a few Linux kernel releases that were effectively feature
>>> driven and never quite made it. 1.3.x? 1.5.x? My memory is bad, but they
>>> were a disaster than eventually led Linus to evolve to the current
>>> model.
>>>
>>> That serves them really 

[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius

Good morning,

yesterday we had an unpleasant surprise that I would like to discuss:

Many (not all!) of our VMs were suddenly
dying (qemu process exiting) and when trying to restart them, inside the
qemu process we saw i/o errors on the disks and the OS was not able to
start (i.e. stopped in initramfs).

When we exported the image from rbd and loop mounted it, there were
however no I/O errors and the filesystem could be cleanly mounted [-1].

We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
some problems reported with kernels < 3.16.39 and thus we upgraded one
host that serves as VM host + runs ceph osds to Devuan ascii using
4.9.0-3-amd64.

Trying to start the VM again on this host however resulted in the same
I/O problem.

We then did the "stupid" approach of exporting an image and importing it
again as the same name [0]. Surprisingly, this solved our problem
reproducible for all affected VMs and allowed us to go back online.

We intentionally left one broken VM in our system (a test VM) so that we
have the chance of debugging further what happened and how we can
prevent it from happening again.

As you might have guessed, there have been some event prior this:

- Some weeks before we upgraded our cluster from kraken to luminous (in
  the right order of mon's first, adding mgrs)

- About a week ago we added the first hdd to our cluster and modified the
  crushmap so that it the "one" pool (from opennebula) still selects
  only ssds

- Some hours before we took out one of the 5 hosts of the ceph cluster,
  as we intended to replace the filesystem based OSDs with bluestore
  (roughly 3 hours prior to the event)

- Short time before the event we readded an osd, but did not "up" it

To our understanding, none of these actions should have triggered this
behaviour, however we are aware that with the upgrade to luminous also
the client libraries were updated and not all qemu processes were
restarted. [1]

After this long story, I was wondering about the following things:

- Why did this happen at all?
  And what is different after we reimported the image?
  Can it be related to disconnected the image from the parent
  (i.e. opennebula creates clones prior to starting a VM)

- We have one broken VM left - is there a way to get it back running
  without doing the export/import dance?

- How / or is http://tracker.ceph.com/issues/18807 related to our issue?
  How is the kernel involved into running VMs that use librbd?
  rbd showmapped does not show any mapped VMs, as qemu connects directly
  to ceph.

  We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
  but did not fix our problem.

We would appreciate any pointer!

Best,

Nico


[-1]
losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
mkdir /tmp/monitoring1-mnt
mount /dev/loop0p1 /tmp/monitoring1-mnt/


[0]

rbd export one/$img /var/tmp/one-staging/$img
rbd rm one/$img
rbd import /var/tmp/one-staging/$img one/$img
rm /var/tmp/one-staging/$img

[1]
[14:05:34] server5:~# ceph features
{
"mon": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3
}
},
"osd": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 49
}
},
"client": {
"group": {
"features": "0xffddff8ee84fffb",
"release": "kraken",
"num": 1
},
"group": {
"features": "0xffddff8eea4fffb",
"release": "luminous",
"num": 4
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 61
}
}
}


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Jason Dillaman
I presume QEMU is using librbd instead of a mapped krbd block device,
correct? If that is the case, can you add "debug-rbd=20" and "debug
objecter=20" to your ceph.conf and boot up your last remaining broken
OSD?

On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
 wrote:
>
> Good morning,
>
> yesterday we had an unpleasant surprise that I would like to discuss:
>
> Many (not all!) of our VMs were suddenly
> dying (qemu process exiting) and when trying to restart them, inside the
> qemu process we saw i/o errors on the disks and the OS was not able to
> start (i.e. stopped in initramfs).
>
> When we exported the image from rbd and loop mounted it, there were
> however no I/O errors and the filesystem could be cleanly mounted [-1].
>
> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
> some problems reported with kernels < 3.16.39 and thus we upgraded one
> host that serves as VM host + runs ceph osds to Devuan ascii using
> 4.9.0-3-amd64.
>
> Trying to start the VM again on this host however resulted in the same
> I/O problem.
>
> We then did the "stupid" approach of exporting an image and importing it
> again as the same name [0]. Surprisingly, this solved our problem
> reproducible for all affected VMs and allowed us to go back online.
>
> We intentionally left one broken VM in our system (a test VM) so that we
> have the chance of debugging further what happened and how we can
> prevent it from happening again.
>
> As you might have guessed, there have been some event prior this:
>
> - Some weeks before we upgraded our cluster from kraken to luminous (in
>   the right order of mon's first, adding mgrs)
>
> - About a week ago we added the first hdd to our cluster and modified the
>   crushmap so that it the "one" pool (from opennebula) still selects
>   only ssds
>
> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>   as we intended to replace the filesystem based OSDs with bluestore
>   (roughly 3 hours prior to the event)
>
> - Short time before the event we readded an osd, but did not "up" it
>
> To our understanding, none of these actions should have triggered this
> behaviour, however we are aware that with the upgrade to luminous also
> the client libraries were updated and not all qemu processes were
> restarted. [1]
>
> After this long story, I was wondering about the following things:
>
> - Why did this happen at all?
>   And what is different after we reimported the image?
>   Can it be related to disconnected the image from the parent
>   (i.e. opennebula creates clones prior to starting a VM)
>
> - We have one broken VM left - is there a way to get it back running
>   without doing the export/import dance?
>
> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>   How is the kernel involved into running VMs that use librbd?
>   rbd showmapped does not show any mapped VMs, as qemu connects directly
>   to ceph.
>
>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>   but did not fix our problem.
>
> We would appreciate any pointer!
>
> Best,
>
> Nico
>
>
> [-1]
> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
> mkdir /tmp/monitoring1-mnt
> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>
>
> [0]
>
> rbd export one/$img /var/tmp/one-staging/$img
> rbd rm one/$img
> rbd import /var/tmp/one-staging/$img one/$img
> rm /var/tmp/one-staging/$img
>
> [1]
> [14:05:34] server5:~# ceph features
> {
> "mon": {
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 3
> }
> },
> "osd": {
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 49
> }
> },
> "client": {
> "group": {
> "features": "0xffddff8ee84fffb",
> "release": "kraken",
> "num": 1
> },
> "group": {
> "features": "0xffddff8eea4fffb",
> "release": "luminous",
> "num": 4
> },
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 61
> }
> }
> }
>
>
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius

Hello Jason,

I think there is a slight misunderstanding:
There is only one *VM*, not one OSD left that we did not start.

Or does librbd also read ceph.conf and will that cause qemu to output
debug messages?

Best,

Nico

Jason Dillaman  writes:

> I presume QEMU is using librbd instead of a mapped krbd block device,
> correct? If that is the case, can you add "debug-rbd=20" and "debug
> objecter=20" to your ceph.conf and boot up your last remaining broken
> OSD?
>
> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
>  wrote:
>>
>> Good morning,
>>
>> yesterday we had an unpleasant surprise that I would like to discuss:
>>
>> Many (not all!) of our VMs were suddenly
>> dying (qemu process exiting) and when trying to restart them, inside the
>> qemu process we saw i/o errors on the disks and the OS was not able to
>> start (i.e. stopped in initramfs).
>>
>> When we exported the image from rbd and loop mounted it, there were
>> however no I/O errors and the filesystem could be cleanly mounted [-1].
>>
>> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
>> some problems reported with kernels < 3.16.39 and thus we upgraded one
>> host that serves as VM host + runs ceph osds to Devuan ascii using
>> 4.9.0-3-amd64.
>>
>> Trying to start the VM again on this host however resulted in the same
>> I/O problem.
>>
>> We then did the "stupid" approach of exporting an image and importing it
>> again as the same name [0]. Surprisingly, this solved our problem
>> reproducible for all affected VMs and allowed us to go back online.
>>
>> We intentionally left one broken VM in our system (a test VM) so that we
>> have the chance of debugging further what happened and how we can
>> prevent it from happening again.
>>
>> As you might have guessed, there have been some event prior this:
>>
>> - Some weeks before we upgraded our cluster from kraken to luminous (in
>>   the right order of mon's first, adding mgrs)
>>
>> - About a week ago we added the first hdd to our cluster and modified the
>>   crushmap so that it the "one" pool (from opennebula) still selects
>>   only ssds
>>
>> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>>   as we intended to replace the filesystem based OSDs with bluestore
>>   (roughly 3 hours prior to the event)
>>
>> - Short time before the event we readded an osd, but did not "up" it
>>
>> To our understanding, none of these actions should have triggered this
>> behaviour, however we are aware that with the upgrade to luminous also
>> the client libraries were updated and not all qemu processes were
>> restarted. [1]
>>
>> After this long story, I was wondering about the following things:
>>
>> - Why did this happen at all?
>>   And what is different after we reimported the image?
>>   Can it be related to disconnected the image from the parent
>>   (i.e. opennebula creates clones prior to starting a VM)
>>
>> - We have one broken VM left - is there a way to get it back running
>>   without doing the export/import dance?
>>
>> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>>   How is the kernel involved into running VMs that use librbd?
>>   rbd showmapped does not show any mapped VMs, as qemu connects directly
>>   to ceph.
>>
>>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>>   but did not fix our problem.
>>
>> We would appreciate any pointer!
>>
>> Best,
>>
>> Nico
>>
>>
>> [-1]
>> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
>> mkdir /tmp/monitoring1-mnt
>> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>>
>>
>> [0]
>>
>> rbd export one/$img /var/tmp/one-staging/$img
>> rbd rm one/$img
>> rbd import /var/tmp/one-staging/$img one/$img
>> rm /var/tmp/one-staging/$img
>>
>> [1]
>> [14:05:34] server5:~# ceph features
>> {
>> "mon": {
>> "group": {
>> "features": "0x1ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 3
>> }
>> },
>> "osd": {
>> "group": {
>> "features": "0x1ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 49
>> }
>> },
>> "client": {
>> "group": {
>> "features": "0xffddff8ee84fffb",
>> "release": "kraken",
>> "num": 1
>> },
>> "group": {
>> "features": "0xffddff8eea4fffb",
>> "release": "luminous",
>> "num": 4
>> },
>> "group": {
>> "features": "0x1ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 61
>> }
>> }
>> }
>>
>>
>> --
>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Jason Dillaman
Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings.

On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius
 wrote:
>
> Hello Jason,
>
> I think there is a slight misunderstanding:
> There is only one *VM*, not one OSD left that we did not start.
>
> Or does librbd also read ceph.conf and will that cause qemu to output
> debug messages?
>
> Best,
>
> Nico
>
> Jason Dillaman  writes:
>
>> I presume QEMU is using librbd instead of a mapped krbd block device,
>> correct? If that is the case, can you add "debug-rbd=20" and "debug
>> objecter=20" to your ceph.conf and boot up your last remaining broken
>> OSD?
>>
>> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
>>  wrote:
>>>
>>> Good morning,
>>>
>>> yesterday we had an unpleasant surprise that I would like to discuss:
>>>
>>> Many (not all!) of our VMs were suddenly
>>> dying (qemu process exiting) and when trying to restart them, inside the
>>> qemu process we saw i/o errors on the disks and the OS was not able to
>>> start (i.e. stopped in initramfs).
>>>
>>> When we exported the image from rbd and loop mounted it, there were
>>> however no I/O errors and the filesystem could be cleanly mounted [-1].
>>>
>>> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
>>> some problems reported with kernels < 3.16.39 and thus we upgraded one
>>> host that serves as VM host + runs ceph osds to Devuan ascii using
>>> 4.9.0-3-amd64.
>>>
>>> Trying to start the VM again on this host however resulted in the same
>>> I/O problem.
>>>
>>> We then did the "stupid" approach of exporting an image and importing it
>>> again as the same name [0]. Surprisingly, this solved our problem
>>> reproducible for all affected VMs and allowed us to go back online.
>>>
>>> We intentionally left one broken VM in our system (a test VM) so that we
>>> have the chance of debugging further what happened and how we can
>>> prevent it from happening again.
>>>
>>> As you might have guessed, there have been some event prior this:
>>>
>>> - Some weeks before we upgraded our cluster from kraken to luminous (in
>>>   the right order of mon's first, adding mgrs)
>>>
>>> - About a week ago we added the first hdd to our cluster and modified the
>>>   crushmap so that it the "one" pool (from opennebula) still selects
>>>   only ssds
>>>
>>> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>>>   as we intended to replace the filesystem based OSDs with bluestore
>>>   (roughly 3 hours prior to the event)
>>>
>>> - Short time before the event we readded an osd, but did not "up" it
>>>
>>> To our understanding, none of these actions should have triggered this
>>> behaviour, however we are aware that with the upgrade to luminous also
>>> the client libraries were updated and not all qemu processes were
>>> restarted. [1]
>>>
>>> After this long story, I was wondering about the following things:
>>>
>>> - Why did this happen at all?
>>>   And what is different after we reimported the image?
>>>   Can it be related to disconnected the image from the parent
>>>   (i.e. opennebula creates clones prior to starting a VM)
>>>
>>> - We have one broken VM left - is there a way to get it back running
>>>   without doing the export/import dance?
>>>
>>> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>>>   How is the kernel involved into running VMs that use librbd?
>>>   rbd showmapped does not show any mapped VMs, as qemu connects directly
>>>   to ceph.
>>>
>>>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>>>   but did not fix our problem.
>>>
>>> We would appreciate any pointer!
>>>
>>> Best,
>>>
>>> Nico
>>>
>>>
>>> [-1]
>>> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
>>> mkdir /tmp/monitoring1-mnt
>>> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>>>
>>>
>>> [0]
>>>
>>> rbd export one/$img /var/tmp/one-staging/$img
>>> rbd rm one/$img
>>> rbd import /var/tmp/one-staging/$img one/$img
>>> rm /var/tmp/one-staging/$img
>>>
>>> [1]
>>> [14:05:34] server5:~# ceph features
>>> {
>>> "mon": {
>>> "group": {
>>> "features": "0x1ffddff8eea4fffb",
>>> "release": "luminous",
>>> "num": 3
>>> }
>>> },
>>> "osd": {
>>> "group": {
>>> "features": "0x1ffddff8eea4fffb",
>>> "release": "luminous",
>>> "num": 49
>>> }
>>> },
>>> "client": {
>>> "group": {
>>> "features": "0xffddff8ee84fffb",
>>> "release": "kraken",
>>> "num": 1
>>> },
>>> "group": {
>>> "features": "0xffddff8eea4fffb",
>>> "release": "luminous",
>>> "num": 4
>>> },
>>> "group": {
>>> "features": "0x1ffddff8eea4fffb",
>>> "release": "luminous",
>>> "num": 61
>>> }
>>> }
>>> }
>>>
>>>
>>> --
>>> Modern, affordable, Swiss Virtual Machines. Visit www.datacente

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius

Just tried and there is not much more log in ceph -w (see below) neither
from the qemu process.

[15:52:43] server4:~$  /usr/bin/qemu-system-x86_64 -name one-17031 -S
-machine pc-i440fx-2.1,accel=kvm,usb=off -m 8192 -realtime mlock=off
-smp 6,sockets=6,cores=1,threads=1 -uuid
79845fca-9b26-4072-bcb3-7f5206c2a531 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-17031.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -boot strict=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file='rbd:one/one-29-17031-0:id=libvirt:key=DELETEME:auth_supported=cephx\;none:mon_host=server1\:6789\;server3\:6789\;server5\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=none'
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=/var/lib/one//datastores/100/17031/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw
 -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -vnc 
[::]:21131 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on 2>&1 | tee 
kvmlogwithdebug

-> no output

The command line of qemu is copied out of what opennebula usually
spawns, minus the networking part.


[15:41:54] server4:~# ceph -w
2017-09-10 15:44:32.873281 7f59f17fa700 10 client.?.objecter ms_handle_connect 
0x7f59f4150e90
2017-09-10 15:44:32.873315 7f59f17fa700 10 client.?.objecter resend_mon_ops
2017-09-10 15:44:32.873327 7f59f17fa700 10 client.?.objecter ms_handle_connect 
0x7f59f41544d0
2017-09-10 15:44:32.873329 7f59f17fa700 10 client.?.objecter resend_mon_ops
2017-09-10 15:44:32.876248 7f59f9a63700 10 client.1021613.objecter 
_maybe_request_map subscribing (onetime) to next osd map
2017-09-10 15:44:32.876710 7f59f17fa700 10 client.1021613.objecter ms_dispatch 
0x7f59f4000fe0 osd_map(9059..9059 src has 8530..9059) v3
2017-09-10 15:44:32.876722 7f59f17fa700  3 client.1021613.objecter 
handle_osd_map got epochs [9059,9059] > 0
2017-09-10 15:44:32.876726 7f59f17fa700  3 client.1021613.objecter 
handle_osd_map decoding full epoch 9059
2017-09-10 15:44:32.877099 7f59f17fa700 20 client.1021613.objecter dump_active 
.. 0 homeless
2017-09-10 15:44:32.877423 7f59f17fa700 10 client.1021613.objecter 
ms_handle_connect 0x7f59dc00c9c0
  cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_OK

  services:
mon: 3 daemons, quorum server5,server3,server1
mgr: 1(active), standbys: 2, 0
osd: 50 osds: 49 up, 49 in

  data:
pools:   2 pools, 1088 pgs
objects: 500k objects, 1962 GB
usage:   5914 GB used, 9757 GB / 15672 GB avail
pgs: 1088 active+clean

  io:
client:   18822 B/s rd, 799 kB/s wr, 6 op/s rd, 52 op/s wr


2017-09-10 15:44:37.876324 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:44:42.876437 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:44:45.223970 7f59f17fa700 10 client.1021613.objecter ms_dispatch 
0x7f59f4000fe0 log(2 entries from seq 215046 at 2017-09-10 15:44:45.164162) v1
2017-09-10 15:44:47.876548 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:44:52.876668 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:44:57.876770 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:45:02.876888 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:45:07.877001 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:45:12.877120 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:45:17.877229 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:45:22.877349 7f59f1ffb700 10 client.1021613.objecter tick
2017-09-10 15:45:27.877455 7f59f1ffb700 10 client.1021613.objecter tick

Jason Dillaman  writes:

> Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings.
>
> On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius
>  wrote:
>>
>> Hello Jason,
>>
>> I think there is a slight misunderstanding:
>> There is only one *VM*, not one OSD left that we did not start.
>>
>> Or does librbd also read ceph.conf and will that cause qemu to output
>> debug messages?
>>
>> Best,
>>
>> Nico
>>
>> Jason Dillaman  writes:
>>
>>> I presume QEMU is using librbd instead of a mapped krbd block device,
>>> correct? If that is the case, can you add "debug-rbd=20" and "debug
>>> objecter=20" to your ceph.conf and boot up your last remaining broken
>>> OSD?
>>>
>>> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
>>>  wrote:

 Good morning,

 yesterday we had an unpleasant surprise that I would like to discuss:

 Many (not all!) of our VMs were suddenly
 dying (qemu process exiting) and when trying to restart them, inside the
 qemu process we saw i/o errors on the disks and the OS was not able to
 start (i.e. stopped in initramfs).

 When we exported the image from rbd and loop mounted it, there were
 however no I/O errors and the filesyste

[ceph-users] Cluster does not report which objects are unfound for stuck PG

2017-09-10 Thread Nikos Kormpakis

Hello people,

after a series on events and some operational mistakes, 1 PG in our 
cluster is
in active+recovering+degraded+remapped state, reporting 1 unfound 
object.
We're running Hammer (v0.94.9) on top of Debian Jessie, on 27 nodes and 
162
osds with the default crushmap and nodeep-scrub flag set. Unfortunately, 
our
pools on our cluster are all set up with replica size = 2 and min_size = 
1.


My main problem is that ceph pg  list_missing does not report which 
objects

are considered unfound, making it quite difficult to understand what is
happening and how to recover without doing any more damage. 
Specifically, the

output of the command is this:

# ceph pg 5.658 list_missing
{
"offset": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -1,
"namespace": ""
},
"num_missing": 0,
"num_unfound": 1,
"objects": [],
"more": 0
}

I took a look on ceph's official docs and on older threads on this list, 
but on
every case that I found, ceph was reporting the objects that it could 
not find.


Our cluster got into that state after a series of events and mistakes. I 
will

provide some timestamps too.
* osds of one node where down+out because of a recent failure (6 osds)
* We decided to start one osd (osd.120) to see how it will behave
* At 14:56:06 we start osd.120
* After starting osd.120, we noticed that recovery starts. As I 
understand now,
we did not want the osd to join the cluster, so we decided to take it 
down
again. It seems to me now that this looked like a panic move, but 
anyway, it

happeded.
* At 14:57:23 we shutdown osd.120.
* Some pgs that were mapped on osd.120 are reported to be down and stuck
requests targeting those osds are popping up. Of course, that meant that 
we

needed to start the osd again.
* At 15:02:59 we start osd.120. PGs are getting up and start peering.
* At 15:03:24, osd.33 (living on a different node) crashes with the 
following

assertion:

0> 2017-09-08 15:03:24.041412 7ff679fa4700 -1 osd/ReplicatedPG.cc: In 
function 'virtual void ReplicatedPG::on_local_recover(const hobject_t&, 
const object_stat_sum_t&, const ObjectRecoveryInfo&, ObjectContextRef, 
ObjectStore::Transaction*)' thread 7ff679fa4700 time 2017-09-08 
15:03:24.002997

osd/ReplicatedPG.cc: 211: FAILED assert(is_primary())

* At 15:03:29 cluster reports that 1 object is unfound. We start 
investigating

the issue.
* After some time, we noticed that pgs mapped to osd.33 are degraded, so 
we
decide to start osd.33 again. It seems to start normally without any 
issues.
* After some time, recovery almost finishes, with all pgs being in a 
healthy

state, except pg 5.658, which should contain the unfound object.

Our cluster is now in the following state:

# ceph -s
cluster 287f8859-9887-4bb3-ae27-531d2a1dbc95
 health HEALTH_WARN
1 pgs degraded
1 pgs recovering
1 pgs stuck degraded
1 pgs stuck unclean
recovery 13/74653914 objects degraded (0.000%)
recovery 300/74653914 objects misplaced (0.000%)
recovery 1/37326882 unfound (0.000%)
nodeep-scrub flag(s) set
 monmap e1: 3 mons at 
{rd0-00=some_ip:6789/0,rd0-01=some_ip2:6789/0,rd0-02=some_ip3:6789/0}

election epoch 5462, quorum 0,1,2 rd0-00,rd0-01,rd0-02
 osdmap e379262: 162 osds: 157 up, 157 in; 1 remapped pgs
flags nodeep-scrub
  pgmap v135824695: 18432 pgs, 5 pools, 98880 GB data, 36452 
kobjects

193 TB used, 89649 GB / 280 TB avail
13/74653914 objects degraded (0.000%)
300/74653914 objects misplaced (0.000%)
1/37326882 unfound (0.000%)
   18430 active+clean
   1 active+recovering+degraded+remapped
   1 active+clean+scrubbing
  client io 9776 kB/s rd, 10937 kB/s wr, 863 op/s

# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck degraded; 1 
pgs stuck unclean; recovery 13/74653918 objects degraded (0.000%); 
recovery 300/74653918 objects misplaced (0.000%); recovery 1/37326884 
unfound (0.000%); nodeep-scrub flag(s) set
pg 5.658 is stuck unclean for 541763.344743, current state 
active+recovering+degraded+remapped, last acting [120,155]
pg 5.658 is stuck degraded for 201445.628108, current state 
active+recovering+degraded+remapped, last acting [120,155]
pg 5.658 is active+recovering+degraded+remapped, acting [120,155], 1 
unfound

recovery 13/74653918 objects degraded (0.000%)
recovery 300/74653918 objects misplaced (0.000%)
recovery 1/37326884 unfound (0.000%)
nodeep-scrub flag(s) set

# ceph pg dump_stuck unclean
ok
pg_stat state   up  up_primary  acting  acting_primary
5.658   active+recovering+degraded+remapped [120,153]   120 
[120,155]   120


# ceph pg 5.658 query
Output be found here [1].

Also, we took a glance at logs but did not noticed anything strange 
except the
crashed osd 

Re: [ceph-users] Power outages!!! help!

2017-09-10 Thread hjcho616
It took a while.  It appears to have cleaned up quite a bit... but still has 
issues.  I've been seeing below message for more than a day and cpu utilization 
and io utilization is low... looks like something is stuck...  I rebooted OSDs 
several times when it looked like it was stuck earlier and it would work on 
something else, but now it is not changing much.  What can I try now?
Regards,Hong
# ceph health detailHEALTH_ERR 22 pgs are stuck inactive for more than 300 
seconds; 22 pgs degraded; 6 pgs down; 11 pgs inconsistent; 6 pgs peering; 6 pgs 
recovering; 16 pgs stale; 22 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs 
stuck stale; 28 pgs stuck unclean; 16 pgs stuck undersized; 16 pgs undersized; 
1 requests are blocked > 32 sec; 1 osds have slow requests; recovery 
221990/4503980 objects degraded (4.929%); recovery 147/2251990 unfound 
(0.007%); 95 scrub errors; mds cluster is degraded; no legacy OSD present but 
'sortbitwise' flag is not setpg 0.e is stuck inactive since forever, current 
state down+peering, last acting [11,2]pg 1.d is stuck inactive since forever, 
current state down+peering, last acting [11,2]pg 1.28 is stuck inactive since 
forever, current state down+peering, last acting [11,6]pg 0.29 is stuck 
inactive since forever, current state down+peering, last acting [11,6]pg 1.2b 
is stuck inactive since forever, current state down+peering, last acting 
[1,11]pg 0.2c is stuck inactive since forever, current state down+peering, last 
acting [1,11]pg 0.e is stuck unclean since forever, current state down+peering, 
last acting [11,2]pg 0.a is stuck unclean for 1233182.248198, current state 
stale+active+undersized+degraded+inconsistent, last acting [0]pg 2.8 is stuck 
unclean for 1238044.714421, current state stale+active+undersized+degraded, 
last acting [0]pg 2.1a is stuck unclean for 1238933.203920, current state 
active+recovering+degraded, last acting [2,11]pg 2.3 is stuck unclean for 
1238882.443876, current state stale+active+undersized+degraded, last acting 
[0]pg 2.27 is stuck unclean for 1295260.765981, current state 
active+recovering+degraded, last acting [11,6]pg 0.d is stuck unclean for 
1230831.504001, current state stale+active+undersized+degraded, last acting 
[0]pg 1.c is stuck unclean for 1238044.715698, current state 
stale+active+undersized+degraded, last acting [0]pg 1.3d is stuck unclean for 
1232066.572856, current state stale+active+undersized+degraded, last acting 
[0]pg 1.28 is stuck unclean since forever, current state down+peering, last 
acting [11,6]pg 0.29 is stuck unclean since forever, current state 
down+peering, last acting [11,6]pg 1.2b is stuck unclean since forever, current 
state down+peering, last acting [1,11]pg 2.2f is stuck unclean for 
1238127.474088, current state active+recovering+degraded+remapped, last acting 
[9,10]pg 0.0 is stuck unclean for 1233182.247776, current state 
stale+active+undersized+degraded, last acting [0]pg 0.2c is stuck unclean since 
forever, current state down+peering, last acting [1,11]pg 2.b is stuck unclean 
for 1238044.640982, current state stale+active+undersized+degraded, last acting 
[0]pg 1.1b is stuck unclean for 1234021.660986, current state 
stale+active+undersized+degraded, last acting [0]pg 0.1c is stuck unclean for 
1232574.189549, current state stale+active+undersized+degraded, last acting 
[0]pg 1.4 is stuck unclean for 1293624.075753, current state 
stale+active+undersized+degraded, last acting [0]pg 0.5 is stuck unclean for 
1237356.776788, current state stale+active+undersized+degraded+inconsistent, 
last acting [0]pg 2.1f is stuck unclean for 8825246.729513, current state 
active+recovering+degraded, last acting [10,2]pg 1.d is stuck unclean since 
forever, current state down+peering, last acting [11,2]pg 2.39 is stuck unclean 
for 1238933.214406, current state stale+active+undersized+degraded, last acting 
[0]pg 1.3a is stuck unclean for 2125299.164204, current state 
stale+active+undersized+degraded, last acting [0]pg 0.3b is stuck unclean for 
1233432.895409, current state stale+active+undersized+degraded, last acting 
[0]pg 2.3c is stuck unclean for 1238933.208648, current state 
active+recovering+degraded, last acting [10,2]pg 2.35 is stuck unclean for 
1295260.753354, current state active+recovering+degraded, last acting [11,6]pg 
1.9 is stuck unclean for 1238044.722811, current state 
stale+active+undersized+degraded, last acting [0]pg 0.a is stuck undersized for 
1229917.081228, current state stale+active+undersized+degraded+inconsistent, 
last acting [0]pg 2.8 is stuck undersized for 1229917.081016, current state 
stale+active+undersized+degraded, last acting [0]pg 2.b is stuck undersized for 
1229917.068181, current state stale+active+undersized+degraded, last acting 
[0]pg 1.9 is stuck undersized for 1229917.075164, current state 
stale+active+undersized+degraded, last acting [0]pg 0.5 is stuck undersized for 
1229917.085330, current state stale+active+undersized+degraded+inconsistent, 
la

Re: [ceph-users] cephfs(Kraken 11.2.1), Unable to write more file when one dir more than 100000 files, mds_bal_fragment_size_max = 5000000

2017-09-10 Thread donglifec...@gmail.com
ZhengYan,

I set "mds_bal_fragment_size_max = 10, mds_bal_frag = true", then  I write  
10 files named 512k.file$i, but there are still some file is missing. such 
as :
[root@yj43959-ceph-dev cephfs]# find ./volumes/ -type f | wc -l
91070




donglifec...@gmail.com
 
From: Yan, Zheng
Date: 2017-09-08 15:20
To: donglifec...@gmail.com
CC: ceph-users; marcus.haarmann
Subject: Re: [ceph-users]cephfs(Kraken 11.2.1), Unable to write more file when 
one dir more than 10 files, mds_bal_fragment_size_max = 500
 
> On 8 Sep 2017, at 13:54, donglifec...@gmail.com wrote:
> 
> ZhengYan,
> 
> I'm sorry, just a description of some questions.
> 
> when one dir more than 10 files, I can continue to write it , but I don't 
> find file which wrote in the past. for example:
> 1.  I write  10 files named 512k.file$i
>
> 2. I continue to write  1 files named aaa.file$i
> 
> 3. I continue to write  1 files named bbb.file$i
> 
> 4.  I continue to write  1 files named ccc.file$i
> 
> 5. I continue to write  1 files named ddd.file$i
> 
> 6. I can't find all ddd.file$i, some ddd.file$i missing. such as:
> 
> [root@yj43959-ceph-dev scripts]# find /mnt/cephfs/volumes -type f  |  grep 
> 512k.file | wc -l
> 10
> [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/aaa.file* | wc -l
> 1
> [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/bbb.file* | wc -l
> 1
> [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/ccc.file* | wc -l
> 1
> [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/ddd.file* | wc -l
> // some files missing
> 1072
 
It’s likely caused by http://tracker.ceph.com/issues/18314.  To support very 
large directory, you should enable directory fragment instead of enlarge 
mds_bal_fragment_size_max.
 
Regards
Yan, Zheng
 
> 
> 
> 
> donglifec...@gmail.com
>  
> From: donglifec...@gmail.com
> Date: 2017-09-08 13:30
> To: zyan
> CC: ceph-users
> Subject: [ceph-users]cephfs(Kraken 11.2.1), Unable to write more file when 
> one dir more than 10 files, mds_bal_fragment_size_max = 500
> ZhengYan,
> 
> I test cephfs(Kraken 11.2.1),  I can't write more files when one dir more 
> than 10 files, I have already set up "mds_bal_fragment_size_max = 
> 500".
> 
> why is this case? Is it a bug?
> 
> Thanks a lot.
> 
> donglifec...@gmail.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph 12.2.0 on 32bit?

2017-09-10 Thread Dyweni - Ceph-Users

Hi,

Is anyone running Ceph Luminous (12.2.0) on 32bit Linux?  Have you seen 
any problems?




My setup has been 1 MON and 7 OSDs (no MDS, RGW, etc), all running Jewel 
(10.2.1), on 32bit, with no issues at all.


I've upgraded everything to latest version of Jewel (10.2.9) and still 
no issues.


Next I upgraded my MON to Luminous (12.2.0) and added MGR to it.  Still 
no issues.


Next I removed one node from the cluster, wiped it clean, upgraded it to 
Luminous (12.2.), and created a new BlueStore data area.  Now this node 
crashes with segmentation fault usually within a few minutes of starting 
up.  I've loaded symbols and used GDB to examine back traces.  From what 
I can tell, the seg faults are happening randomly, and the stack is 
corrupted, so traces from GDB are unusable (even with all symbols 
installed for all packages on the system). However, in all cases, the 
seg fault is occuring in the 'msgr-worker-' thread.





My data is fine, just would like to get Ceph 12.2.0 running stably on 
this node, so I can upgrade the remaining nodes and switch everything 
over to BlueStore.




Thanks,
Dyweni
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com