Hi all,
I’m testing Ceph Luminous 12.2.1 installed with ceph ansible.
Doing some failover tests I noticed that when I kill an osd or and hosts Ceph
doesn’t recover automatically remaining in this state until I bring OSDs or
host back online.
I’ve 3 pools volumes, cephfs_data and cephfs_metadata
In the meanwhile I find out why this happened.
For some reason the 3 osds was not marked out of the cluster as the other and
this caused the cluster to not reassing PGs to other OSDs.
This is strange because I leaved the 3 osds down for two days.
> Il giorno 16 ott 2017, alle ore 10:21, Matteo
How long did you wait after killing? It shouldn't recover instantly, but
after a timeout.
And are they marked out or reweight 0? (this is what the timeout does,
and recovery should start when they are marked out)
And are you killing any mons?
And can you show the output after killing... `ceph -s
On 16/10/17 03:40, Alex Gorbachev wrote:
> On Sat, Oct 14, 2017 at 12:25 PM, Oscar Segarra
> wrote:
>> Hi,
>>
>> In my VDI environment I have configured the suggested ceph
>> design/arquitecture:
>>
>> http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/
>>
>> Where I have a Base Image + Protected S
I thought I'd pick up on this older thread instead of starting a new one.
For the WAL something between 512MB and 2GB should be sufficient as Mark Nelson
explained in a different thread.
The DB however I'm not certain about at this moment. The general consensus
seems to be: "use as much as avai
> Op 13 oktober 2017 om 10:22 schreef Hans van den Bogert
> :
>
>
> Hi,
>
> I’m in the middle of debugging some incompatibilities with an upgrade of
> Proxmox which uses Ceph. At this point I’d like to know what my current value
> is for the min-compat-client setting, which would’ve been se
Hi,
I want to upgrade my ceph from jewel to luminous, and switch to bluestore.
For that I export the pools from old cluster:
/rados export -p pool1 pool1.ceph/
and after upgrade and osd recreation:
/rados --create -p pool1 import pool1.ceph/
I can import the backup without error, but when I w
On Mon, Oct 16, 2017 at 11:35 AM, Nagy Ákos wrote:
> Hi,
>
> I want to upgrade my ceph from jewel to luminous, and switch to bluestore.
>
> For that I export the pools from old cluster:
This is not the way to do it. You should convert your OSDs from
filestore to bluestore one by one, and let the
Thanks,
but I erase all of the data, I have only this backup.
If the restore work for 3 pools, I can do it for the remainig 2?
What can I try to set, to import it or how I can find this IDs?
2017. 10. 16. 13:39 keltezéssel, John Spray írta:
> On Mon, Oct 16, 2017 at 11:35 AM, Nagy Ákos wrote:
>
Hi list,
some of you also use ceph as storage backend for OpenStack, so maybe
you can help me out.
Last week we upgraded our Mitaka cloud to Ocata (via Newton, of
course), and also upgraded the cloud nodes from openSUSE Leap 42.1 to
Leap 42.3. There were some issues as expected, but no sh
Hi,
During rather high load and rebalancing, a couple of our OSDs crashed
and they fail to start. This is from the log:
-2> 2017-10-16 13:27:50.235204 7f5e4c3bae80 0 osd.1 442123 load_pgs
opened 370 pgs
-1> 2017-10-16 13:27:50.239175 7f5e4c3bae80 1 osd.1 442123
build_past_intervals_para
> Op 16 oktober 2017 om 13:00 schreef Nagy Ákos :
>
>
> Thanks,
>
> but I erase all of the data, I have only this backup.
I hate to bring the bad news, but it will not work. The pools have different
IDs and that will make it very difficult to get this working again.
Wido
> If the restore wo
> Op 26 september 2017 om 16:39 schreef Mark Nelson :
>
>
>
>
> On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
> > thanks David,
> >
> > that's confirming what I was assuming. To bad that there is no
> > estimate/method to calculate the db partition size.
>
> It's possible that we might be abl
The pool ids can be updated to point to the correct pool [1] with
enough patience. The larger issue is that the snapshots are not
preserved and thus your cloned images can be corrupted if the parent
image was modified after the creation of the protected snapshot.
[1] http://lists.ceph.com/pipermai
Hi,
I have deployed a Ceph cluster (Jewel). By default all block devices that
are created are thin provisioned.
Is it possible to change this setting? I would like to have that all
created block devices are thick provisioned.
In front of the Ceph cluster, I am running Openstack.
Thanks!
Sinan
Hi Matteo,
This looks like the 'noout' flag might be set for your cluster.
Please check it with:
ceph osd dump | grep flags
If you see 'noout' flag is set, you can unset it with:
ceph osd unset noout
Regards,
Anthony
- Original Message -
> From: "Matteo Dacrema"
> To: ceph-users@lists
On 10/16/2017 02:02 PM, Dejan Lesjak wrote:
> Hi,
>
> During rather high load and rebalancing, a couple of our OSDs crashed
> and they fail to start. This is from the log:
>
> -2> 2017-10-16 13:27:50.235204 7f5e4c3bae80 0 osd.1 442123 load_pgs
> opened 370 pgs
> -1> 2017-10-16 13:27:50.2
On 16/10/17 13:45, Wido den Hollander wrote:
>> Op 26 september 2017 om 16:39 schreef Mark Nelson :
>> On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
>>> thanks David,
>>>
>>> that's confirming what I was assuming. To bad that there is no
>>> estimate/method to calculate the db partition size.
>>
>>
Hi all, i have to hot-swap a failed osd on a Luminous Cluster with Blue
store (the disk is SATA, WAL and DB are on NVME).
I've issued a:
* ceph osd crush reweight osd_id 0
* systemctl stop (osd I'd daemon)
* umount /var/lib/ceph/osd/osd_id
* ceph osd destroy osd_id
everything seems of, but if I l
Hi,
At UZH we are currently evaluating cephfs as a distributed file system
for the scratch space of an HPC installation. Some slow down of the
metadata operations seems to occur under certain circumstances. In
particular, commands issued after some big file deletion could take
several seconds
Not sure if anyone has noticed this yet, but I see your osd tree does not
include hosts level - you get OSDs right under the root bucket. Default crush
rule would make sure to allocate OSDs from different hosts - and there are no
hosts in hierarchy.
OSD would usually put itself under the hostna
Thanks,
I don't have any snapshot or clone, only some standalone images.
An follow the linked thread, I can change my id's, and I can restore all
the images.
Thanks a lot!
2017. 10. 16. 17:21 keltezéssel, Jason Dillaman írta:
> The pool ids can be updated to point to the correct pool [1] with
>
Thanks, that’s what I was looking for.
However, should we create the ` get-require-min-compat-client luminous` option
nonetheless? I’m willing to write the patch, unless someone thinks it’s not a
good idea.
Regards
Hans
> On Oct 16, 2017, at 12:13 PM, Wido den Hollander wrote:
>
>
>> Op 1
Your error message indicates it failed to open a parent image of a clone:
2017-10-16 13:18:17.404858 7f35a37fe700 -1
librbd::image::RefreshParentRequest: failed to open parent image: (2)
No such file or directory
That means that the parent image has a snapshot that the clone is
linked against.
O
Just as a weird update to this, I accidentally left the scrub cron
script disabled after the testing described in the previous message.
Even with *no* deep scrubs running, the “REQUEST_SLOW” problem is
still occurring every few minutes.
It seems something is seriously wrong with this cluster.
In
This pool is an LXD pool, created by LXD for containers. Maybe LXD when
create a container, create a snapshot from the source image and clone
it. I forget about this.
Probably this is a reason why I can't restore only this pool.
I can restore my images, after I set a parent ID obtained from a newl
We're using cephfs here as well for HPC scratch, but we're on Luminous 12.2.1.
This issue seems to have been fixed between Jewel and Luminous, we don't have
such problems. :) Any reason you guys aren't evaluating the latest LTS?
From: ceph-users on behalf of Tya
On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak wrote:
> On 10/16/2017 02:02 PM, Dejan Lesjak wrote:
> > Hi,
> >
> > During rather high load and rebalancing, a couple of our OSDs crashed
> > and they fail to start. This is from the log:
> >
> > -2> 2017-10-16 13:27:50.235204 7f5e4c3bae80 0 osd.
> From: "Sage Weil"
> To: "Alfredo Deza"
> Cc: "ceph-devel" , ceph-users@lists.ceph.com
> Sent: Monday, October 9, 2017 11:09:29 AM
> Subject: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and
> disk partition support]
>
> To put this in context, the goal here is to kill ceph-
On Mon, 16 Oct 2017, Anthony Verevkin wrote:
>
> > From: "Sage Weil"
> > To: "Alfredo Deza"
> > Cc: "ceph-devel" , ceph-users@lists.ceph.com
> > Sent: Monday, October 9, 2017 11:09:29 AM
> > Subject: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and
> > disk partition support]
On Sat, Oct 14, 2017 at 7:24 AM, zhaomingyue wrote:
> 1、this assert happened accidently, not easy to reproduce; In fact, I also
> suppose this assert is caused by device data lost;
> but if has lost,how it can accur that (last_update +1 = log.rbegin.version) ,
> in case of losting data, it's mor
> On 17. okt. 2017, at 00:23, Gregory Farnum wrote:
>
> On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak wrote:
> On 10/16/2017 02:02 PM, Dejan Lesjak wrote:
> > Hi,
> >
> > During rather high load and rebalancing, a couple of our OSDs crashed
> > and they fail to start. This is from the log:
> >
>
On Mon, Oct 16, 2017 at 3:49 PM Dejan Lesjak wrote:
>
> > On 17. okt. 2017, at 00:23, Gregory Farnum wrote:
> >
> > On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak
> wrote:
> > On 10/16/2017 02:02 PM, Dejan Lesjak wrote:
> > > Hi,
> > >
> > > During rather high load and rebalancing, a couple of ou
On Mon, 16 Oct 2017 18:32:06 -0400 (EDT) Anthony Verevkin wrote:
> > From: "Sage Weil"
> > To: "Alfredo Deza"
> > Cc: "ceph-devel" , ceph-users@lists.ceph.com
> > Sent: Monday, October 9, 2017 11:09:29 AM
> > Subject: [ceph-users] killing ceph-disk [was Re: ceph-volume: migration and
> > disk p
Hi, Gregory, Ian!
There is very little information on striper mode in Ceph documentation.
Could this explanation help?
The logic of striper mode is very much the same as in RAID-0. There are 3
parameters that drives it:
stripe_unit - the stripe size (default=4M)
stripe_count - how many objects
That looks right to me... PRs for the Ceph docs are welcome! :)
On Mon, Oct 16, 2017 at 4:35 PM Alexander Kushnirenko
wrote:
> Hi, Gregory, Ian!
>
> There is very little information on striper mode in Ceph documentation.
> Could this explanation help?
>
> The logic of striper mode is very much t
Maybe an additional example where the numbers don't line up all so
nicely would be good as well. For example it's not immediately obvious
to me what would happen with the stripe settings given by your example
but you write 97M of data
Would it be 4 objects of 24M and 4 objects of 250KB? Or will the
I don't see that same_interval_since being cleared by split.
PG::split_into() copies the history from the parent PG to child. The
only code in Luminous that I see that clears it is in
ceph_objectstore_tool.cc
David
On 10/16/17 3:59 PM, Gregory Farnum wrote:
On Mon, Oct 16, 2017 at 3:49 PM
> On 17. okt. 2017, at 00:59, Gregory Farnum wrote:
>
> On Mon, Oct 16, 2017 at 3:49 PM Dejan Lesjak wrote:
>
> > On 17. okt. 2017, at 00:23, Gregory Farnum wrote:
> >
> > On Mon, Oct 16, 2017 at 8:24 AM Dejan Lesjak wrote:
> > On 10/16/2017 02:02 PM, Dejan Lesjak wrote:
> > > Hi,
> > >
> >
> Op 16 oktober 2017 om 18:14 schreef Richard Hesketh
> :
>
>
> On 16/10/17 13:45, Wido den Hollander wrote:
> >> Op 26 september 2017 om 16:39 schreef Mark Nelson :
> >> On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
> >>> thanks David,
> >>>
> >>> that's confirming what I was assuming. To bad
> Op 16 oktober 2017 om 22:18 schreef Hans van den Bogert
> :
>
>
> Thanks, that’s what I was looking for.
>
> However, should we create the ` get-require-min-compat-client luminous`
> option nonetheless? I’m willing to write the patch, unless someone thinks
> it’s not a good idea.
>
I th
41 matches
Mail list logo