Re: [ceph-users] [Luminous] rgw not deleting object
Hi, I had a similar problem on jewel, where I was unable to properly delete objects eventhough radosgw-admin returned rc 0 after issuing rm, somehow the object was deleted but the metadata wasn't removed. I ran # radosgw-admin --cluster ceph object stat --bucket=weird_bucket --object=$OBJECT to figure out if the object was there or not and then used the 'rados put' command to upload a dummy object and then remove it properly # rados -c /etc/ceph/ceph.conf -p ceph.rgw.buckets.data put be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_$OBJECT dummy.file Hope it helps, Andreas On 10 Sep 2017 1:20 a.m., "Jack" wrote: > Hi, > > I face a wild issue: I cannot remove an object from rgw (via s3 API) > > My steps: > s3cmd ls s3://bucket/object -> it exists > s3cmd rm s3://bucket/object -> success > s3cmd ls s3://bucket/object -> it still exists > > At this point, I can curl and get the object (thus, it does exists) > > Doing the same via boto leads to the same behavior > > Log sample: > 2017-09-10 01:18:42.502486 7fd189e7d700 1 == starting new request > req=0x7fd189e77300 = > 2017-09-10 01:18:42.504028 7fd189e7d700 1 == req done > req=0x7fd189e77300 op status=-2 http_status=204 == > 2017-09-10 01:18:42.504076 7fd189e7d700 1 civetweb: 0x560ebc275000: > 10.42.43.6 - - [10/Sep/2017:01:18:38 +0200] "DELETE /bucket/object > HTTP/1.1" 1 0 - Boto/2.44.0 Python/3.5.4 Linux/4.12.0-1-amd64 > > What can I do ? > What data shall I provide to debug this issue ? > > Regards, > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-maintainers] Ceph release cadence
I'm not a huge fan of train releases, as they tend to never quite make it on time and it always feels a bit artificial timeline anyway. OTOH, I do see and understand the need of a predictable schedule with a roadmap attached to it. There are many that need to have at least a vague idea on what we're going to ship and when, so that they can plan ahead. We need it ourselves, as sometimes the schedule can dictate the engineering decision that we're going to make. On the other hand, I think that working towards a release that come out after 9 or 12 months is a bit too long. This is a recipe for more delays as the penalty for missing a feature is painful. Maybe we can consider returning to shorter iterations for *dev* releases. These are checkpoints that need to happen after a short period (2-3 weeks), where we end up with a minimally tested release that passes some smoke test. Features are incrementally added to the dev release. The idea behind a short term dev release is that it minimizes the window where master is completely unusable, thus reduces the time to stabilization. Then it's easier to enforce a train schedule if we want to. It might be easier to let go of a feature that doesn't make it, as it will be there soon, and maybe if really needed we (or the downstream maintainer) can make the decision to backport it. This makes me think that we could revisit the our backport policy/procedure/tooling, so that we can do it in a sane and safe way when needed and possible. Yehuda On Fri, Sep 8, 2017 at 7:59 PM, Gregory Farnum wrote: > I think I'm the resident train release advocate so I'm sure my > advocating that model will surprise nobody. I'm not sure I'd go all > the way to Lars' multi-release maintenance model (although it's > definitely something I'm interested in), but there are two big reasons > I wish we were on a train with more frequent real releases: > > 1) It reduces the cost of features missing a release. Right now if > something misses an LTS release, that's it for a year. And nobody > likes releasing an LTS without a bunch of big new features, so each > LTS is later than the one before as we scramble to get features merged > in. > > ...and then we deal with the fact that we scrambled to get a bunch of > features merged in and they weren't quite baked. (Luminous so far > seems to have gone much better in this regard! Hurray! But I think > that has a lot to do with our feature-release-scramble this year being > mostly peripheral stuff around user interfaces that got tacked on > about the time we'd initially planned the release to occur.) > > 2) Train releases increase predictability for downstreams, partners, > and users around when releases will happen. Right now, the release > process and schedule is entirely opaque to anybody who's not involved > in every single upstream meeting we have; and it's unpredictable even > to those who are. That makes things difficult, as Xiaoxi said. > > There are other peripheral but serious benefits I'd expect to see from > fully-validated train releases as well. It would be *awesome* to have > more frequent known-stable points to do new development against. If > you're an external developer and you want a new feature, you have to > either keep it rebased against a fast-changing master branch, or you > need to settle for writing it against a long-out-of-date LTS and then > forward-porting it for merge. If you're an FS developer writing a very > small new OSD feature and you try to validate it against RADOS, you've > no idea if bugs that pop up and look random are because you really did > something wrong or if there's currently an intermittent issue in RADOS > master. I would have *loved* to be able to maintain CephFS integration > branches for features that didn't touch RADOS and were built on top of > the latest release instead of master, but it was utterly infeasible > because there were too many missing features with the long delays. > > On Fri, Sep 8, 2017 at 9:16 AM, Sage Weil wrote: >> I'm going to pick on Lars a bit here... >> >> On Thu, 7 Sep 2017, Lars Marowsky-Bree wrote: >>> On 2017-09-06T15:23:34, Sage Weil wrote: >>> > Other options we should consider? Other thoughts? >>> >>> With about 20-odd years in software development, I've become a big >>> believer in schedule-driven releases. If it's feature-based, you never >>> know when they'll get done. >>> >>> If the schedule intervals are too long though, the urge to press too >>> much in (so as not to miss the next merge window) is just too high, >>> meaning the train gets derailed. (Which cascades into the future, >>> because the next time the pressure will be even higher based on the >>> previous experience.) This requires strictness. >>> >>> We've had a few Linux kernel releases that were effectively feature >>> driven and never quite made it. 1.3.x? 1.5.x? My memory is bad, but they >>> were a disaster than eventually led Linus to evolve to the current >>> model. >>> >>> That serves them really
[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]
Good morning, yesterday we had an unpleasant surprise that I would like to discuss: Many (not all!) of our VMs were suddenly dying (qemu process exiting) and when trying to restart them, inside the qemu process we saw i/o errors on the disks and the OS was not able to start (i.e. stopped in initramfs). When we exported the image from rbd and loop mounted it, there were however no I/O errors and the filesystem could be cleanly mounted [-1]. We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are some problems reported with kernels < 3.16.39 and thus we upgraded one host that serves as VM host + runs ceph osds to Devuan ascii using 4.9.0-3-amd64. Trying to start the VM again on this host however resulted in the same I/O problem. We then did the "stupid" approach of exporting an image and importing it again as the same name [0]. Surprisingly, this solved our problem reproducible for all affected VMs and allowed us to go back online. We intentionally left one broken VM in our system (a test VM) so that we have the chance of debugging further what happened and how we can prevent it from happening again. As you might have guessed, there have been some event prior this: - Some weeks before we upgraded our cluster from kraken to luminous (in the right order of mon's first, adding mgrs) - About a week ago we added the first hdd to our cluster and modified the crushmap so that it the "one" pool (from opennebula) still selects only ssds - Some hours before we took out one of the 5 hosts of the ceph cluster, as we intended to replace the filesystem based OSDs with bluestore (roughly 3 hours prior to the event) - Short time before the event we readded an osd, but did not "up" it To our understanding, none of these actions should have triggered this behaviour, however we are aware that with the upgrade to luminous also the client libraries were updated and not all qemu processes were restarted. [1] After this long story, I was wondering about the following things: - Why did this happen at all? And what is different after we reimported the image? Can it be related to disconnected the image from the parent (i.e. opennebula creates clones prior to starting a VM) - We have one broken VM left - is there a way to get it back running without doing the export/import dance? - How / or is http://tracker.ceph.com/issues/18807 related to our issue? How is the kernel involved into running VMs that use librbd? rbd showmapped does not show any mapped VMs, as qemu connects directly to ceph. We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64, but did not fix our problem. We would appreciate any pointer! Best, Nico [-1] losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img mkdir /tmp/monitoring1-mnt mount /dev/loop0p1 /tmp/monitoring1-mnt/ [0] rbd export one/$img /var/tmp/one-staging/$img rbd rm one/$img rbd import /var/tmp/one-staging/$img one/$img rm /var/tmp/one-staging/$img [1] [14:05:34] server5:~# ceph features { "mon": { "group": { "features": "0x1ffddff8eea4fffb", "release": "luminous", "num": 3 } }, "osd": { "group": { "features": "0x1ffddff8eea4fffb", "release": "luminous", "num": 49 } }, "client": { "group": { "features": "0xffddff8ee84fffb", "release": "kraken", "num": 1 }, "group": { "features": "0xffddff8eea4fffb", "release": "luminous", "num": 4 }, "group": { "features": "0x1ffddff8eea4fffb", "release": "luminous", "num": 61 } } } -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]
I presume QEMU is using librbd instead of a mapped krbd block device, correct? If that is the case, can you add "debug-rbd=20" and "debug objecter=20" to your ceph.conf and boot up your last remaining broken OSD? On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius wrote: > > Good morning, > > yesterday we had an unpleasant surprise that I would like to discuss: > > Many (not all!) of our VMs were suddenly > dying (qemu process exiting) and when trying to restart them, inside the > qemu process we saw i/o errors on the disks and the OS was not able to > start (i.e. stopped in initramfs). > > When we exported the image from rbd and loop mounted it, there were > however no I/O errors and the filesystem could be cleanly mounted [-1]. > > We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are > some problems reported with kernels < 3.16.39 and thus we upgraded one > host that serves as VM host + runs ceph osds to Devuan ascii using > 4.9.0-3-amd64. > > Trying to start the VM again on this host however resulted in the same > I/O problem. > > We then did the "stupid" approach of exporting an image and importing it > again as the same name [0]. Surprisingly, this solved our problem > reproducible for all affected VMs and allowed us to go back online. > > We intentionally left one broken VM in our system (a test VM) so that we > have the chance of debugging further what happened and how we can > prevent it from happening again. > > As you might have guessed, there have been some event prior this: > > - Some weeks before we upgraded our cluster from kraken to luminous (in > the right order of mon's first, adding mgrs) > > - About a week ago we added the first hdd to our cluster and modified the > crushmap so that it the "one" pool (from opennebula) still selects > only ssds > > - Some hours before we took out one of the 5 hosts of the ceph cluster, > as we intended to replace the filesystem based OSDs with bluestore > (roughly 3 hours prior to the event) > > - Short time before the event we readded an osd, but did not "up" it > > To our understanding, none of these actions should have triggered this > behaviour, however we are aware that with the upgrade to luminous also > the client libraries were updated and not all qemu processes were > restarted. [1] > > After this long story, I was wondering about the following things: > > - Why did this happen at all? > And what is different after we reimported the image? > Can it be related to disconnected the image from the parent > (i.e. opennebula creates clones prior to starting a VM) > > - We have one broken VM left - is there a way to get it back running > without doing the export/import dance? > > - How / or is http://tracker.ceph.com/issues/18807 related to our issue? > How is the kernel involved into running VMs that use librbd? > rbd showmapped does not show any mapped VMs, as qemu connects directly > to ceph. > > We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64, > but did not fix our problem. > > We would appreciate any pointer! > > Best, > > Nico > > > [-1] > losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img > mkdir /tmp/monitoring1-mnt > mount /dev/loop0p1 /tmp/monitoring1-mnt/ > > > [0] > > rbd export one/$img /var/tmp/one-staging/$img > rbd rm one/$img > rbd import /var/tmp/one-staging/$img one/$img > rm /var/tmp/one-staging/$img > > [1] > [14:05:34] server5:~# ceph features > { > "mon": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 3 > } > }, > "osd": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 49 > } > }, > "client": { > "group": { > "features": "0xffddff8ee84fffb", > "release": "kraken", > "num": 1 > }, > "group": { > "features": "0xffddff8eea4fffb", > "release": "luminous", > "num": 4 > }, > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 61 > } > } > } > > > -- > Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]
Hello Jason, I think there is a slight misunderstanding: There is only one *VM*, not one OSD left that we did not start. Or does librbd also read ceph.conf and will that cause qemu to output debug messages? Best, Nico Jason Dillaman writes: > I presume QEMU is using librbd instead of a mapped krbd block device, > correct? If that is the case, can you add "debug-rbd=20" and "debug > objecter=20" to your ceph.conf and boot up your last remaining broken > OSD? > > On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius > wrote: >> >> Good morning, >> >> yesterday we had an unpleasant surprise that I would like to discuss: >> >> Many (not all!) of our VMs were suddenly >> dying (qemu process exiting) and when trying to restart them, inside the >> qemu process we saw i/o errors on the disks and the OS was not able to >> start (i.e. stopped in initramfs). >> >> When we exported the image from rbd and loop mounted it, there were >> however no I/O errors and the filesystem could be cleanly mounted [-1]. >> >> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are >> some problems reported with kernels < 3.16.39 and thus we upgraded one >> host that serves as VM host + runs ceph osds to Devuan ascii using >> 4.9.0-3-amd64. >> >> Trying to start the VM again on this host however resulted in the same >> I/O problem. >> >> We then did the "stupid" approach of exporting an image and importing it >> again as the same name [0]. Surprisingly, this solved our problem >> reproducible for all affected VMs and allowed us to go back online. >> >> We intentionally left one broken VM in our system (a test VM) so that we >> have the chance of debugging further what happened and how we can >> prevent it from happening again. >> >> As you might have guessed, there have been some event prior this: >> >> - Some weeks before we upgraded our cluster from kraken to luminous (in >> the right order of mon's first, adding mgrs) >> >> - About a week ago we added the first hdd to our cluster and modified the >> crushmap so that it the "one" pool (from opennebula) still selects >> only ssds >> >> - Some hours before we took out one of the 5 hosts of the ceph cluster, >> as we intended to replace the filesystem based OSDs with bluestore >> (roughly 3 hours prior to the event) >> >> - Short time before the event we readded an osd, but did not "up" it >> >> To our understanding, none of these actions should have triggered this >> behaviour, however we are aware that with the upgrade to luminous also >> the client libraries were updated and not all qemu processes were >> restarted. [1] >> >> After this long story, I was wondering about the following things: >> >> - Why did this happen at all? >> And what is different after we reimported the image? >> Can it be related to disconnected the image from the parent >> (i.e. opennebula creates clones prior to starting a VM) >> >> - We have one broken VM left - is there a way to get it back running >> without doing the export/import dance? >> >> - How / or is http://tracker.ceph.com/issues/18807 related to our issue? >> How is the kernel involved into running VMs that use librbd? >> rbd showmapped does not show any mapped VMs, as qemu connects directly >> to ceph. >> >> We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64, >> but did not fix our problem. >> >> We would appreciate any pointer! >> >> Best, >> >> Nico >> >> >> [-1] >> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img >> mkdir /tmp/monitoring1-mnt >> mount /dev/loop0p1 /tmp/monitoring1-mnt/ >> >> >> [0] >> >> rbd export one/$img /var/tmp/one-staging/$img >> rbd rm one/$img >> rbd import /var/tmp/one-staging/$img one/$img >> rm /var/tmp/one-staging/$img >> >> [1] >> [14:05:34] server5:~# ceph features >> { >> "mon": { >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 3 >> } >> }, >> "osd": { >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 49 >> } >> }, >> "client": { >> "group": { >> "features": "0xffddff8ee84fffb", >> "release": "kraken", >> "num": 1 >> }, >> "group": { >> "features": "0xffddff8eea4fffb", >> "release": "luminous", >> "num": 4 >> }, >> "group": { >> "features": "0x1ffddff8eea4fffb", >> "release": "luminous", >> "num": 61 >> } >> } >> } >> >> >> -- >> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]
Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings. On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius wrote: > > Hello Jason, > > I think there is a slight misunderstanding: > There is only one *VM*, not one OSD left that we did not start. > > Or does librbd also read ceph.conf and will that cause qemu to output > debug messages? > > Best, > > Nico > > Jason Dillaman writes: > >> I presume QEMU is using librbd instead of a mapped krbd block device, >> correct? If that is the case, can you add "debug-rbd=20" and "debug >> objecter=20" to your ceph.conf and boot up your last remaining broken >> OSD? >> >> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius >> wrote: >>> >>> Good morning, >>> >>> yesterday we had an unpleasant surprise that I would like to discuss: >>> >>> Many (not all!) of our VMs were suddenly >>> dying (qemu process exiting) and when trying to restart them, inside the >>> qemu process we saw i/o errors on the disks and the OS was not able to >>> start (i.e. stopped in initramfs). >>> >>> When we exported the image from rbd and loop mounted it, there were >>> however no I/O errors and the filesystem could be cleanly mounted [-1]. >>> >>> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are >>> some problems reported with kernels < 3.16.39 and thus we upgraded one >>> host that serves as VM host + runs ceph osds to Devuan ascii using >>> 4.9.0-3-amd64. >>> >>> Trying to start the VM again on this host however resulted in the same >>> I/O problem. >>> >>> We then did the "stupid" approach of exporting an image and importing it >>> again as the same name [0]. Surprisingly, this solved our problem >>> reproducible for all affected VMs and allowed us to go back online. >>> >>> We intentionally left one broken VM in our system (a test VM) so that we >>> have the chance of debugging further what happened and how we can >>> prevent it from happening again. >>> >>> As you might have guessed, there have been some event prior this: >>> >>> - Some weeks before we upgraded our cluster from kraken to luminous (in >>> the right order of mon's first, adding mgrs) >>> >>> - About a week ago we added the first hdd to our cluster and modified the >>> crushmap so that it the "one" pool (from opennebula) still selects >>> only ssds >>> >>> - Some hours before we took out one of the 5 hosts of the ceph cluster, >>> as we intended to replace the filesystem based OSDs with bluestore >>> (roughly 3 hours prior to the event) >>> >>> - Short time before the event we readded an osd, but did not "up" it >>> >>> To our understanding, none of these actions should have triggered this >>> behaviour, however we are aware that with the upgrade to luminous also >>> the client libraries were updated and not all qemu processes were >>> restarted. [1] >>> >>> After this long story, I was wondering about the following things: >>> >>> - Why did this happen at all? >>> And what is different after we reimported the image? >>> Can it be related to disconnected the image from the parent >>> (i.e. opennebula creates clones prior to starting a VM) >>> >>> - We have one broken VM left - is there a way to get it back running >>> without doing the export/import dance? >>> >>> - How / or is http://tracker.ceph.com/issues/18807 related to our issue? >>> How is the kernel involved into running VMs that use librbd? >>> rbd showmapped does not show any mapped VMs, as qemu connects directly >>> to ceph. >>> >>> We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64, >>> but did not fix our problem. >>> >>> We would appreciate any pointer! >>> >>> Best, >>> >>> Nico >>> >>> >>> [-1] >>> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img >>> mkdir /tmp/monitoring1-mnt >>> mount /dev/loop0p1 /tmp/monitoring1-mnt/ >>> >>> >>> [0] >>> >>> rbd export one/$img /var/tmp/one-staging/$img >>> rbd rm one/$img >>> rbd import /var/tmp/one-staging/$img one/$img >>> rm /var/tmp/one-staging/$img >>> >>> [1] >>> [14:05:34] server5:~# ceph features >>> { >>> "mon": { >>> "group": { >>> "features": "0x1ffddff8eea4fffb", >>> "release": "luminous", >>> "num": 3 >>> } >>> }, >>> "osd": { >>> "group": { >>> "features": "0x1ffddff8eea4fffb", >>> "release": "luminous", >>> "num": 49 >>> } >>> }, >>> "client": { >>> "group": { >>> "features": "0xffddff8ee84fffb", >>> "release": "kraken", >>> "num": 1 >>> }, >>> "group": { >>> "features": "0xffddff8eea4fffb", >>> "release": "luminous", >>> "num": 4 >>> }, >>> "group": { >>> "features": "0x1ffddff8eea4fffb", >>> "release": "luminous", >>> "num": 61 >>> } >>> } >>> } >>> >>> >>> -- >>> Modern, affordable, Swiss Virtual Machines. Visit www.datacente
Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]
Just tried and there is not much more log in ceph -w (see below) neither from the qemu process. [15:52:43] server4:~$ /usr/bin/qemu-system-x86_64 -name one-17031 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -uuid 79845fca-9b26-4072-bcb3-7f5206c2a531 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-17031.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file='rbd:one/one-29-17031-0:id=libvirt:key=DELETEME:auth_supported=cephx\;none:mon_host=server1\:6789\;server3\:6789\;server5\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=none' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/one//datastores/100/17031/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -vnc [::]:21131 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on 2>&1 | tee kvmlogwithdebug -> no output The command line of qemu is copied out of what opennebula usually spawns, minus the networking part. [15:41:54] server4:~# ceph -w 2017-09-10 15:44:32.873281 7f59f17fa700 10 client.?.objecter ms_handle_connect 0x7f59f4150e90 2017-09-10 15:44:32.873315 7f59f17fa700 10 client.?.objecter resend_mon_ops 2017-09-10 15:44:32.873327 7f59f17fa700 10 client.?.objecter ms_handle_connect 0x7f59f41544d0 2017-09-10 15:44:32.873329 7f59f17fa700 10 client.?.objecter resend_mon_ops 2017-09-10 15:44:32.876248 7f59f9a63700 10 client.1021613.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-09-10 15:44:32.876710 7f59f17fa700 10 client.1021613.objecter ms_dispatch 0x7f59f4000fe0 osd_map(9059..9059 src has 8530..9059) v3 2017-09-10 15:44:32.876722 7f59f17fa700 3 client.1021613.objecter handle_osd_map got epochs [9059,9059] > 0 2017-09-10 15:44:32.876726 7f59f17fa700 3 client.1021613.objecter handle_osd_map decoding full epoch 9059 2017-09-10 15:44:32.877099 7f59f17fa700 20 client.1021613.objecter dump_active .. 0 homeless 2017-09-10 15:44:32.877423 7f59f17fa700 10 client.1021613.objecter ms_handle_connect 0x7f59dc00c9c0 cluster: id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab health: HEALTH_OK services: mon: 3 daemons, quorum server5,server3,server1 mgr: 1(active), standbys: 2, 0 osd: 50 osds: 49 up, 49 in data: pools: 2 pools, 1088 pgs objects: 500k objects, 1962 GB usage: 5914 GB used, 9757 GB / 15672 GB avail pgs: 1088 active+clean io: client: 18822 B/s rd, 799 kB/s wr, 6 op/s rd, 52 op/s wr 2017-09-10 15:44:37.876324 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:44:42.876437 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:44:45.223970 7f59f17fa700 10 client.1021613.objecter ms_dispatch 0x7f59f4000fe0 log(2 entries from seq 215046 at 2017-09-10 15:44:45.164162) v1 2017-09-10 15:44:47.876548 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:44:52.876668 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:44:57.876770 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:45:02.876888 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:45:07.877001 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:45:12.877120 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:45:17.877229 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:45:22.877349 7f59f1ffb700 10 client.1021613.objecter tick 2017-09-10 15:45:27.877455 7f59f1ffb700 10 client.1021613.objecter tick Jason Dillaman writes: > Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings. > > On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius > wrote: >> >> Hello Jason, >> >> I think there is a slight misunderstanding: >> There is only one *VM*, not one OSD left that we did not start. >> >> Or does librbd also read ceph.conf and will that cause qemu to output >> debug messages? >> >> Best, >> >> Nico >> >> Jason Dillaman writes: >> >>> I presume QEMU is using librbd instead of a mapped krbd block device, >>> correct? If that is the case, can you add "debug-rbd=20" and "debug >>> objecter=20" to your ceph.conf and boot up your last remaining broken >>> OSD? >>> >>> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius >>> wrote: Good morning, yesterday we had an unpleasant surprise that I would like to discuss: Many (not all!) of our VMs were suddenly dying (qemu process exiting) and when trying to restart them, inside the qemu process we saw i/o errors on the disks and the OS was not able to start (i.e. stopped in initramfs). When we exported the image from rbd and loop mounted it, there were however no I/O errors and the filesyste
[ceph-users] Cluster does not report which objects are unfound for stuck PG
Hello people, after a series on events and some operational mistakes, 1 PG in our cluster is in active+recovering+degraded+remapped state, reporting 1 unfound object. We're running Hammer (v0.94.9) on top of Debian Jessie, on 27 nodes and 162 osds with the default crushmap and nodeep-scrub flag set. Unfortunately, our pools on our cluster are all set up with replica size = 2 and min_size = 1. My main problem is that ceph pg list_missing does not report which objects are considered unfound, making it quite difficult to understand what is happening and how to recover without doing any more damage. Specifically, the output of the command is this: # ceph pg 5.658 list_missing { "offset": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max": 0, "pool": -1, "namespace": "" }, "num_missing": 0, "num_unfound": 1, "objects": [], "more": 0 } I took a look on ceph's official docs and on older threads on this list, but on every case that I found, ceph was reporting the objects that it could not find. Our cluster got into that state after a series of events and mistakes. I will provide some timestamps too. * osds of one node where down+out because of a recent failure (6 osds) * We decided to start one osd (osd.120) to see how it will behave * At 14:56:06 we start osd.120 * After starting osd.120, we noticed that recovery starts. As I understand now, we did not want the osd to join the cluster, so we decided to take it down again. It seems to me now that this looked like a panic move, but anyway, it happeded. * At 14:57:23 we shutdown osd.120. * Some pgs that were mapped on osd.120 are reported to be down and stuck requests targeting those osds are popping up. Of course, that meant that we needed to start the osd again. * At 15:02:59 we start osd.120. PGs are getting up and start peering. * At 15:03:24, osd.33 (living on a different node) crashes with the following assertion: 0> 2017-09-08 15:03:24.041412 7ff679fa4700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::on_local_recover(const hobject_t&, const object_stat_sum_t&, const ObjectRecoveryInfo&, ObjectContextRef, ObjectStore::Transaction*)' thread 7ff679fa4700 time 2017-09-08 15:03:24.002997 osd/ReplicatedPG.cc: 211: FAILED assert(is_primary()) * At 15:03:29 cluster reports that 1 object is unfound. We start investigating the issue. * After some time, we noticed that pgs mapped to osd.33 are degraded, so we decide to start osd.33 again. It seems to start normally without any issues. * After some time, recovery almost finishes, with all pgs being in a healthy state, except pg 5.658, which should contain the unfound object. Our cluster is now in the following state: # ceph -s cluster 287f8859-9887-4bb3-ae27-531d2a1dbc95 health HEALTH_WARN 1 pgs degraded 1 pgs recovering 1 pgs stuck degraded 1 pgs stuck unclean recovery 13/74653914 objects degraded (0.000%) recovery 300/74653914 objects misplaced (0.000%) recovery 1/37326882 unfound (0.000%) nodeep-scrub flag(s) set monmap e1: 3 mons at {rd0-00=some_ip:6789/0,rd0-01=some_ip2:6789/0,rd0-02=some_ip3:6789/0} election epoch 5462, quorum 0,1,2 rd0-00,rd0-01,rd0-02 osdmap e379262: 162 osds: 157 up, 157 in; 1 remapped pgs flags nodeep-scrub pgmap v135824695: 18432 pgs, 5 pools, 98880 GB data, 36452 kobjects 193 TB used, 89649 GB / 280 TB avail 13/74653914 objects degraded (0.000%) 300/74653914 objects misplaced (0.000%) 1/37326882 unfound (0.000%) 18430 active+clean 1 active+recovering+degraded+remapped 1 active+clean+scrubbing client io 9776 kB/s rd, 10937 kB/s wr, 863 op/s # ceph health detail HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck degraded; 1 pgs stuck unclean; recovery 13/74653918 objects degraded (0.000%); recovery 300/74653918 objects misplaced (0.000%); recovery 1/37326884 unfound (0.000%); nodeep-scrub flag(s) set pg 5.658 is stuck unclean for 541763.344743, current state active+recovering+degraded+remapped, last acting [120,155] pg 5.658 is stuck degraded for 201445.628108, current state active+recovering+degraded+remapped, last acting [120,155] pg 5.658 is active+recovering+degraded+remapped, acting [120,155], 1 unfound recovery 13/74653918 objects degraded (0.000%) recovery 300/74653918 objects misplaced (0.000%) recovery 1/37326884 unfound (0.000%) nodeep-scrub flag(s) set # ceph pg dump_stuck unclean ok pg_stat state up up_primary acting acting_primary 5.658 active+recovering+degraded+remapped [120,153] 120 [120,155] 120 # ceph pg 5.658 query Output be found here [1]. Also, we took a glance at logs but did not noticed anything strange except the crashed osd
Re: [ceph-users] Power outages!!! help!
It took a while. It appears to have cleaned up quite a bit... but still has issues. I've been seeing below message for more than a day and cpu utilization and io utilization is low... looks like something is stuck... I rebooted OSDs several times when it looked like it was stuck earlier and it would work on something else, but now it is not changing much. What can I try now? Regards,Hong # ceph health detailHEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 22 pgs degraded; 6 pgs down; 11 pgs inconsistent; 6 pgs peering; 6 pgs recovering; 16 pgs stale; 22 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 28 pgs stuck unclean; 16 pgs stuck undersized; 16 pgs undersized; 1 requests are blocked > 32 sec; 1 osds have slow requests; recovery 221990/4503980 objects degraded (4.929%); recovery 147/2251990 unfound (0.007%); 95 scrub errors; mds cluster is degraded; no legacy OSD present but 'sortbitwise' flag is not setpg 0.e is stuck inactive since forever, current state down+peering, last acting [11,2]pg 1.d is stuck inactive since forever, current state down+peering, last acting [11,2]pg 1.28 is stuck inactive since forever, current state down+peering, last acting [11,6]pg 0.29 is stuck inactive since forever, current state down+peering, last acting [11,6]pg 1.2b is stuck inactive since forever, current state down+peering, last acting [1,11]pg 0.2c is stuck inactive since forever, current state down+peering, last acting [1,11]pg 0.e is stuck unclean since forever, current state down+peering, last acting [11,2]pg 0.a is stuck unclean for 1233182.248198, current state stale+active+undersized+degraded+inconsistent, last acting [0]pg 2.8 is stuck unclean for 1238044.714421, current state stale+active+undersized+degraded, last acting [0]pg 2.1a is stuck unclean for 1238933.203920, current state active+recovering+degraded, last acting [2,11]pg 2.3 is stuck unclean for 1238882.443876, current state stale+active+undersized+degraded, last acting [0]pg 2.27 is stuck unclean for 1295260.765981, current state active+recovering+degraded, last acting [11,6]pg 0.d is stuck unclean for 1230831.504001, current state stale+active+undersized+degraded, last acting [0]pg 1.c is stuck unclean for 1238044.715698, current state stale+active+undersized+degraded, last acting [0]pg 1.3d is stuck unclean for 1232066.572856, current state stale+active+undersized+degraded, last acting [0]pg 1.28 is stuck unclean since forever, current state down+peering, last acting [11,6]pg 0.29 is stuck unclean since forever, current state down+peering, last acting [11,6]pg 1.2b is stuck unclean since forever, current state down+peering, last acting [1,11]pg 2.2f is stuck unclean for 1238127.474088, current state active+recovering+degraded+remapped, last acting [9,10]pg 0.0 is stuck unclean for 1233182.247776, current state stale+active+undersized+degraded, last acting [0]pg 0.2c is stuck unclean since forever, current state down+peering, last acting [1,11]pg 2.b is stuck unclean for 1238044.640982, current state stale+active+undersized+degraded, last acting [0]pg 1.1b is stuck unclean for 1234021.660986, current state stale+active+undersized+degraded, last acting [0]pg 0.1c is stuck unclean for 1232574.189549, current state stale+active+undersized+degraded, last acting [0]pg 1.4 is stuck unclean for 1293624.075753, current state stale+active+undersized+degraded, last acting [0]pg 0.5 is stuck unclean for 1237356.776788, current state stale+active+undersized+degraded+inconsistent, last acting [0]pg 2.1f is stuck unclean for 8825246.729513, current state active+recovering+degraded, last acting [10,2]pg 1.d is stuck unclean since forever, current state down+peering, last acting [11,2]pg 2.39 is stuck unclean for 1238933.214406, current state stale+active+undersized+degraded, last acting [0]pg 1.3a is stuck unclean for 2125299.164204, current state stale+active+undersized+degraded, last acting [0]pg 0.3b is stuck unclean for 1233432.895409, current state stale+active+undersized+degraded, last acting [0]pg 2.3c is stuck unclean for 1238933.208648, current state active+recovering+degraded, last acting [10,2]pg 2.35 is stuck unclean for 1295260.753354, current state active+recovering+degraded, last acting [11,6]pg 1.9 is stuck unclean for 1238044.722811, current state stale+active+undersized+degraded, last acting [0]pg 0.a is stuck undersized for 1229917.081228, current state stale+active+undersized+degraded+inconsistent, last acting [0]pg 2.8 is stuck undersized for 1229917.081016, current state stale+active+undersized+degraded, last acting [0]pg 2.b is stuck undersized for 1229917.068181, current state stale+active+undersized+degraded, last acting [0]pg 1.9 is stuck undersized for 1229917.075164, current state stale+active+undersized+degraded, last acting [0]pg 0.5 is stuck undersized for 1229917.085330, current state stale+active+undersized+degraded+inconsistent, la
Re: [ceph-users] cephfs(Kraken 11.2.1), Unable to write more file when one dir more than 100000 files, mds_bal_fragment_size_max = 5000000
ZhengYan, I set "mds_bal_fragment_size_max = 10, mds_bal_frag = true", then I write 10 files named 512k.file$i, but there are still some file is missing. such as : [root@yj43959-ceph-dev cephfs]# find ./volumes/ -type f | wc -l 91070 donglifec...@gmail.com From: Yan, Zheng Date: 2017-09-08 15:20 To: donglifec...@gmail.com CC: ceph-users; marcus.haarmann Subject: Re: [ceph-users]cephfs(Kraken 11.2.1), Unable to write more file when one dir more than 10 files, mds_bal_fragment_size_max = 500 > On 8 Sep 2017, at 13:54, donglifec...@gmail.com wrote: > > ZhengYan, > > I'm sorry, just a description of some questions. > > when one dir more than 10 files, I can continue to write it , but I don't > find file which wrote in the past. for example: > 1. I write 10 files named 512k.file$i > > 2. I continue to write 1 files named aaa.file$i > > 3. I continue to write 1 files named bbb.file$i > > 4. I continue to write 1 files named ccc.file$i > > 5. I continue to write 1 files named ddd.file$i > > 6. I can't find all ddd.file$i, some ddd.file$i missing. such as: > > [root@yj43959-ceph-dev scripts]# find /mnt/cephfs/volumes -type f | grep > 512k.file | wc -l > 10 > [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/aaa.file* | wc -l > 1 > [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/bbb.file* | wc -l > 1 > [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/ccc.file* | wc -l > 1 > [root@yj43959-ceph-dev scripts]# ls /mnt/cephfs/volumes/ddd.file* | wc -l > // some files missing > 1072 It’s likely caused by http://tracker.ceph.com/issues/18314. To support very large directory, you should enable directory fragment instead of enlarge mds_bal_fragment_size_max. Regards Yan, Zheng > > > > donglifec...@gmail.com > > From: donglifec...@gmail.com > Date: 2017-09-08 13:30 > To: zyan > CC: ceph-users > Subject: [ceph-users]cephfs(Kraken 11.2.1), Unable to write more file when > one dir more than 10 files, mds_bal_fragment_size_max = 500 > ZhengYan, > > I test cephfs(Kraken 11.2.1), I can't write more files when one dir more > than 10 files, I have already set up "mds_bal_fragment_size_max = > 500". > > why is this case? Is it a bug? > > Thanks a lot. > > donglifec...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph 12.2.0 on 32bit?
Hi, Is anyone running Ceph Luminous (12.2.0) on 32bit Linux? Have you seen any problems? My setup has been 1 MON and 7 OSDs (no MDS, RGW, etc), all running Jewel (10.2.1), on 32bit, with no issues at all. I've upgraded everything to latest version of Jewel (10.2.9) and still no issues. Next I upgraded my MON to Luminous (12.2.0) and added MGR to it. Still no issues. Next I removed one node from the cluster, wiped it clean, upgraded it to Luminous (12.2.), and created a new BlueStore data area. Now this node crashes with segmentation fault usually within a few minutes of starting up. I've loaded symbols and used GDB to examine back traces. From what I can tell, the seg faults are happening randomly, and the stack is corrupted, so traces from GDB are unusable (even with all symbols installed for all packages on the system). However, in all cases, the seg fault is occuring in the 'msgr-worker-' thread. My data is fine, just would like to get Ceph 12.2.0 running stably on this node, so I can upgrade the remaining nodes and switch everything over to BlueStore. Thanks, Dyweni ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com