[ceph-users] Ceph not recovering after osd/host failure

2017-10-16 Thread Matteo Dacrema
Hi all, I’m testing Ceph Luminous 12.2.1 installed with ceph ansible. Doing some failover tests I noticed that when I kill an osd or and hosts Ceph doesn’t recover automatically remaining in this state until I bring OSDs or host back online. I’ve 3 pools volumes, cephfs_data and cephfs_metadata

Re: [ceph-users] Ceph not recovering after osd/host failure

2017-10-16 Thread Matteo Dacrema
In the meanwhile I find out why this happened. For some reason the 3 osds was not marked out of the cluster as the other and this caused the cluster to not reassing PGs to other OSDs. This is strange because I leaved the 3 osds down for two days. > Il giorno 16 ott 2017, alle ore 10:21, Matteo

Re: [ceph-users] Ceph not recovering after osd/host failure

2017-10-16 Thread Peter Maloney
How long did you wait after killing? It shouldn't recover instantly, but after a timeout. And are they marked out or reweight 0? (this is what the timeout does, and recovery should start when they are marked out) And are you killing any mons? And can you show the output after killing... `ceph -s

Re: [ceph-users] Backup VM (Base image + snapshot)

2017-10-16 Thread Richard Hesketh
On 16/10/17 03:40, Alex Gorbachev wrote: > On Sat, Oct 14, 2017 at 12:25 PM, Oscar Segarra > wrote: >> Hi, >> >> In my VDI environment I have configured the suggested ceph >> design/arquitecture: >> >> http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/ >> >> Where I have a Base Image + Protected S

Re: [ceph-users] Bluestore "separate" WAL and DB

2017-10-16 Thread Wido den Hollander
I thought I'd pick up on this older thread instead of starting a new one. For the WAL something between 512MB and 2GB should be sufficient as Mark Nelson explained in a different thread. The DB however I'm not certain about at this moment. The general consensus seems to be: "use as much as avai

Re: [ceph-users] How to get current min-compat-client setting

2017-10-16 Thread Wido den Hollander
> Op 13 oktober 2017 om 10:22 schreef Hans van den Bogert > : > > > Hi, > > I’m in the middle of debugging some incompatibilities with an upgrade of > Proxmox which uses Ceph. At this point I’d like to know what my current value > is for the min-compat-client setting, which would’ve been se

[ceph-users] rados export/import fail

2017-10-16 Thread Nagy Ákos
Hi, I want to upgrade my ceph from jewel to luminous, and switch to bluestore. For that I export the pools from old cluster: /rados export -p pool1 pool1.ceph/ and after upgrade and osd recreation: /rados --create -p pool1 import pool1.ceph/ I can import the backup without error, but when I w

Re: [ceph-users] rados export/import fail

2017-10-16 Thread John Spray
On Mon, Oct 16, 2017 at 11:35 AM, Nagy Ákos wrote: > Hi, > > I want to upgrade my ceph from jewel to luminous, and switch to bluestore. > > For that I export the pools from old cluster: This is not the way to do it. You should convert your OSDs from filestore to bluestore one by one, and let the

Re: [ceph-users] rados export/import fail

2017-10-16 Thread Nagy Ákos
Thanks, but I erase all of the data, I have only this backup. If the restore work for 3 pools, I can do it for the remainig 2? What can I try to set, to import it or how I can find this IDs? 2017. 10. 16. 13:39 keltezéssel, John Spray írta: > On Mon, Oct 16, 2017 at 11:35 AM, Nagy Ákos wrote: >

[ceph-users] [ocata] [cinder] cinder-volume causes high cpu load

2017-10-16 Thread Eugen Block
Hi list, some of you also use ceph as storage backend for OpenStack, so maybe you can help me out. Last week we upgraded our Mitaka cloud to Ocata (via Newton, of course), and also upgraded the cloud nodes from openSUSE Leap 42.1 to Leap 42.3. There were some issues as expected, but no sh

[ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread Dejan Lesjak
Hi, During rather high load and rebalancing, a couple of our OSDs crashed and they fail to start. This is from the log: -2> 2017-10-16 13:27:50.235204 7f5e4c3bae80 0 osd.1 442123 load_pgs opened 370 pgs -1> 2017-10-16 13:27:50.239175 7f5e4c3bae80 1 osd.1 442123 build_past_intervals_para

Re: [ceph-users] rados export/import fail

2017-10-16 Thread Wido den Hollander
> Op 16 oktober 2017 om 13:00 schreef Nagy Ákos : > > > Thanks, > > but I erase all of the data, I have only this backup. I hate to bring the bad news, but it will not work. The pools have different IDs and that will make it very difficult to get this working again. Wido > If the restore wo

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-16 Thread Wido den Hollander
> Op 26 september 2017 om 16:39 schreef Mark Nelson : > > > > > On 09/26/2017 01:10 AM, Dietmar Rieder wrote: > > thanks David, > > > > that's confirming what I was assuming. To bad that there is no > > estimate/method to calculate the db partition size. > > It's possible that we might be abl

Re: [ceph-users] rados export/import fail

2017-10-16 Thread Jason Dillaman
The pool ids can be updated to point to the correct pool [1] with enough patience. The larger issue is that the snapshots are not preserved and thus your cloned images can be corrupted if the parent image was modified after the creation of the protected snapshot. [1] http://lists.ceph.com/pipermai

[ceph-users] Thick provisioning

2017-10-16 Thread sinan
Hi, I have deployed a Ceph cluster (Jewel). By default all block devices that are created are thin provisioned. Is it possible to change this setting? I would like to have that all created block devices are thick provisioned. In front of the Ceph cluster, I am running Openstack. Thanks! Sinan

Re: [ceph-users] Ceph not recovering after osd/host failure

2017-10-16 Thread Anthony Verevkin
Hi Matteo, This looks like the 'noout' flag might be set for your cluster. Please check it with: ceph osd dump | grep flags If you see 'noout' flag is set, you can unset it with: ceph osd unset noout Regards, Anthony - Original Message - > From: "Matteo Dacrema" > To: ceph-users@lists

Re: [ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread Dejan Lesjak
On 10/16/2017 02:02 PM, Dejan Lesjak wrote: > Hi, > > During rather high load and rebalancing, a couple of our OSDs crashed > and they fail to start. This is from the log: > > -2> 2017-10-16 13:27:50.235204 7f5e4c3bae80 0 osd.1 442123 load_pgs > opened 370 pgs > -1> 2017-10-16 13:27:50.2

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-16 Thread Richard Hesketh
On 16/10/17 13:45, Wido den Hollander wrote: >> Op 26 september 2017 om 16:39 schreef Mark Nelson : >> On 09/26/2017 01:10 AM, Dietmar Rieder wrote: >>> thanks David, >>> >>> that's confirming what I was assuming. To bad that there is no >>> estimate/method to calculate the db partition size. >> >>

[ceph-users] How to stop using (unmount) a failed OSD with BlueStore ?

2017-10-16 Thread Alejandro Comisario
Hi all, i have to hot-swap a failed osd on a Luminous Cluster with Blue store (the disk is SATA, WAL and DB are on NVME). I've issued a: * ceph osd crush reweight osd_id 0 * systemctl stop (osd I'd daemon) * umount /var/lib/ceph/osd/osd_id * ceph osd destroy osd_id everything seems of, but if I l

[ceph-users] cephfs: some metadata operations take seconds to complete

2017-10-16 Thread Tyanko Aleksiev
Hi, At UZH we are currently evaluating cephfs as a distributed file system for the scratch space of an HPC installation. Some slow down of the metadata operations seems to occur under certain circumstances. In particular, commands issued after some big file deletion could take several seconds

Re: [ceph-users] Brand new cluster -- pg is stuck inactive

2017-10-16 Thread Anthony Verevkin
Not sure if anyone has noticed this yet, but I see your osd tree does not include hosts level - you get OSDs right under the root bucket. Default crush rule would make sure to allocate OSDs from different hosts - and there are no hosts in hierarchy. OSD would usually put itself under the hostna

Re: [ceph-users] rados export/import fail

2017-10-16 Thread Nagy Ákos
Thanks, I don't have any snapshot or clone, only some standalone images. An follow the linked thread, I can change my id's, and I can restore all the images. Thanks a lot! 2017. 10. 16. 17:21 keltezéssel, Jason Dillaman írta: > The pool ids can be updated to point to the correct pool [1] with >

Re: [ceph-users] How to get current min-compat-client setting

2017-10-16 Thread Hans van den Bogert
Thanks, that’s what I was looking for. However, should we create the ` get-require-min-compat-client luminous` option nonetheless? I’m willing to write the patch, unless someone thinks it’s not a good idea. Regards Hans > On Oct 16, 2017, at 12:13 PM, Wido den Hollander wrote: > > >> Op 1

Re: [ceph-users] rados export/import fail

2017-10-16 Thread Jason Dillaman
Your error message indicates it failed to open a parent image of a clone: 2017-10-16 13:18:17.404858 7f35a37fe700 -1 librbd::image::RefreshParentRequest: failed to open parent image: (2) No such file or directory That means that the parent image has a snapshot that the clone is linked against. O

Re: [ceph-users] osd max scrubs not honored?

2017-10-16 Thread J David
Just as a weird update to this, I accidentally left the scrub cron script disabled after the testing described in the previous message. Even with *no* deep scrubs running, the “REQUEST_SLOW” problem is still occurring every few minutes. It seems something is seriously wrong with this cluster. In

Re: [ceph-users] rados export/import fail

2017-10-16 Thread Nagy Ákos
This pool is an LXD pool, created by LXD for containers. Maybe LXD when create a container, create a snapshot from the source image and clone it. I forget about this. Probably this is a reason why I can't restore only this pool. I can restore my images, after I set a parent ID obtained from a newl

Re: [ceph-users] cephfs: some metadata operations take seconds to complete

2017-10-16 Thread Linh Vu
We're using cephfs here as well for HPC scratch, but we're on Luminous 12.2.1. This issue seems to have been fixed between Jewel and Luminous, we don't have such problems. :) Any reason you guys aren't evaluating the latest LTS? From: ceph-users on behalf of Tya

Re: [ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread Gregory Farnum
On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak wrote: > On 10/16/2017 02:02 PM, Dejan Lesjak wrote: > > Hi, > > > > During rather high load and rebalancing, a couple of our OSDs crashed > > and they fail to start. This is from the log: > > > > -2> 2017-10-16 13:27:50.235204 7f5e4c3bae80 0 osd.

Re: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and disk partition support]

2017-10-16 Thread Anthony Verevkin
> From: "Sage Weil" > To: "Alfredo Deza" > Cc: "ceph-devel" , ceph-users@lists.ceph.com > Sent: Monday, October 9, 2017 11:09:29 AM > Subject: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and > disk partition support] > > To put this in context, the goal here is to kill ceph-

Re: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and disk partition support]

2017-10-16 Thread Sage Weil
On Mon, 16 Oct 2017, Anthony Verevkin wrote: > > > From: "Sage Weil" > > To: "Alfredo Deza" > > Cc: "ceph-devel" , ceph-users@lists.ceph.com > > Sent: Monday, October 9, 2017 11:09:29 AM > > Subject: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and > > disk partition support]

Re: [ceph-users] 答复: assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

2017-10-16 Thread Gregory Farnum
On Sat, Oct 14, 2017 at 7:24 AM, zhaomingyue wrote: > 1、this assert happened accidently, not easy to reproduce; In fact, I also > suppose this assert is caused by device data lost; > but if has lost,how it can accur that (last_update +1 = log.rbegin.version) , > in case of losting data, it's mor

Re: [ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread Dejan Lesjak
> On 17. okt. 2017, at 00:23, Gregory Farnum wrote: > > On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak wrote: > On 10/16/2017 02:02 PM, Dejan Lesjak wrote: > > Hi, > > > > During rather high load and rebalancing, a couple of our OSDs crashed > > and they fail to start. This is from the log: > > >

Re: [ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread Gregory Farnum
On Mon, Oct 16, 2017 at 3:49 PM Dejan Lesjak wrote: > > > On 17. okt. 2017, at 00:23, Gregory Farnum wrote: > > > > On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak > wrote: > > On 10/16/2017 02:02 PM, Dejan Lesjak wrote: > > > Hi, > > > > > > During rather high load and rebalancing, a couple of ou

Re: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and disk partition support]

2017-10-16 Thread Christian Balzer
On Mon, 16 Oct 2017 18:32:06 -0400 (EDT) Anthony Verevkin wrote: > > From: "Sage Weil" > > To: "Alfredo Deza" > > Cc: "ceph-devel" , ceph-users@lists.ceph.com > > Sent: Monday, October 9, 2017 11:09:29 AM > > Subject: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and > > disk p

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-16 Thread Alexander Kushnirenko
Hi, Gregory, Ian! There is very little information on striper mode in Ceph documentation. Could this explanation help? The logic of striper mode is very much the same as in RAID-0. There are 3 parameters that drives it: stripe_unit - the stripe size (default=4M) stripe_count - how many objects

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-16 Thread Gregory Farnum
That looks right to me... PRs for the Ceph docs are welcome! :) On Mon, Oct 16, 2017 at 4:35 PM Alexander Kushnirenko wrote: > Hi, Gregory, Ian! > > There is very little information on striper mode in Ceph documentation. > Could this explanation help? > > The logic of striper mode is very much t

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-16 Thread Christian Wuerdig
Maybe an additional example where the numbers don't line up all so nicely would be good as well. For example it's not immediately obvious to me what would happen with the stripe settings given by your example but you write 97M of data Would it be 4 objects of 24M and 4 objects of 250KB? Or will the

Re: [ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread David Zafman
I don't see that same_interval_since being cleared by split. PG::split_into() copies the history from the parent PG to child. The only code in Luminous that I see that clears it is in ceph_objectstore_tool.cc David On 10/16/17 3:59 PM, Gregory Farnum wrote: On Mon, Oct 16, 2017 at 3:49 PM

Re: [ceph-users] Osd FAILED assert(p.same_interval_since)

2017-10-16 Thread Dejan Lesjak
> On 17. okt. 2017, at 00:59, Gregory Farnum wrote: > > On Mon, Oct 16, 2017 at 3:49 PM Dejan Lesjak wrote: > > > On 17. okt. 2017, at 00:23, Gregory Farnum wrote: > > > > On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak wrote: > > On 10/16/2017 02:02 PM, Dejan Lesjak wrote: > > > Hi, > > > > >

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-16 Thread Wido den Hollander
> Op 16 oktober 2017 om 18:14 schreef Richard Hesketh > : > > > On 16/10/17 13:45, Wido den Hollander wrote: > >> Op 26 september 2017 om 16:39 schreef Mark Nelson : > >> On 09/26/2017 01:10 AM, Dietmar Rieder wrote: > >>> thanks David, > >>> > >>> that's confirming what I was assuming. To bad

Re: [ceph-users] How to get current min-compat-client setting

2017-10-16 Thread Wido den Hollander
> Op 16 oktober 2017 om 22:18 schreef Hans van den Bogert > : > > > Thanks, that’s what I was looking for. > > However, should we create the ` get-require-min-compat-client luminous` > option nonetheless? I’m willing to write the patch, unless someone thinks > it’s not a good idea. > I th