I think he means that after disk failure he waits for the cluster to get back
to ok (so all data on the lost disk have been reconstructed elsewhere) and then
the disk is changed. In that case it's normal to have misplaced objects
(because with the new disk some pgs needs to be migrated to popula
We are now using osd_op_queue = wpq. Maybe returning to prio should help ?
What are you using on your mimic custer ?
F.
Le 25/06/2020 à 19:28, Frank Schilder a écrit :
OK, this *does* sound bad. I would consider this a show stopper for upgrade
from mimic.
Best regards,
=
Frank
Hi,
in our production cluster (proxmox 5.4, ceph 12.2) there is an issue
since yesterday. after an increase of a pool 5 OSDs do not start,
status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) set,
5 osds down, 128 osds: 123 up, 128 in.
last lines of OSD-logfile:
2020-06-26 08:40:26.
Hi all,
I'm going to deploy a cluster with erasure code pool for cold storage.
There are 3 servers for me to set up the cluster, 12 OSDs on each server.
Does that mean the data is secure while 1/3 OSDs of the cluster is down,
or only 2 of the OSDs is down , if I set the ec profile with k=4 and m=2
Thanks. I will try to change osd_op_queue_cut_off to high and restart
everything (and use this downtime to upgrade the servers).
F.
Le 26/06/2020 à 09:46, Frank Schilder a écrit :
I'm using
osd_op_queue = wpq
osd_op_queue_cut_off = high
and these settings are recommended.
Best regards,
=
From my point of view, it's better to have no more than 6 osd wal/db on 1
nvme.
I think that's the root cause of the slow requests, maybe.
Mark Kirkwood 于2020年6月26日周五 上午7:47写道:
> Progress update:
>
> - tweaked debug_rocksdb to 1/5. *possibly* helped, fewer slow requests
>
> - will increase osd_m
Den fre 26 juni 2020 kl 10:32 skrev Zhenshi Zhou :
> Hi all,
>
> I'm going to deploy a cluster with erasure code pool for cold storage.
> There are 3 servers for me to set up the cluster, 12 OSDs on each server.
> Does that mean the data is secure while 1/3 OSDs of the cluster is down,
> or only 2
Hi Marc,
None of the CephFS issues are show-stoppers but we're anyway waiting
for them to land in nautilus:
* https://tracker.ceph.com/issues/45090
* https://tracker.ceph.com/issues/45261
* https://tracker.ceph.com/issues/45835
* https://tracker.ceph.com/issues/45875
Cheers, Dan
On Thu, Jun 25,
On 26/06/2020 5:27 pm, Francois Legrand wrote:
In that case it's normal to have misplaced objects (because with the new disk
some pgs needs to be migrated to populate this new space), but degraded pg does
not seems to be the good behaviour !
Yes, that would be bad, not sure if thats the proce
On 26/06/2020 6:31 pm, Zhenshi Zhou wrote:
I'm going to deploy a cluster with erasure code pool for cold storage.
There are 3 servers for me to set up the cluster, 12 OSDs on each server.
Does that mean the data is secure while 1/3 OSDs of the cluster is down,
or only 2 of the OSDs is down , if I
Hi Janne,
I use the default profile(2+1) and set failure-domain=host, is my best
practice?
Janne Johansson 于2020年6月26日周五 下午4:59写道:
> Den fre 26 juni 2020 kl 10:32 skrev Zhenshi Zhou :
>
>> Hi all,
>>
>> I'm going to deploy a cluster with erasure code pool for cold storage.
>> There are 3 server
Hi Lindsay,
I have only 3 hosts, and is there any method to set a EC pool cluster in a
better way
Lindsay Mathieson 于2020年6月26日周五 下午6:03写道:
> On 26/06/2020 6:31 pm, Zhenshi Zhou wrote:
> > I'm going to deploy a cluster with erasure code pool for cold storage.
> > There are 3 servers for me to s
On 26/06/2020 8:08 pm, Zhenshi Zhou wrote:
Hi Lindsay,
I have only 3 hosts, and is there any method to set a EC pool cluster
in a better way
There's failure domain by OSD, which Janne knows far better than I :)
--
Lindsay
___
ceph-users mailing lis
I will give it a try, thanks:)
Lindsay Mathieson 于2020年6月26日周五 下午7:07写道:
> On 26/06/2020 8:08 pm, Zhenshi Zhou wrote:
> > Hi Lindsay,
> >
> > I have only 3 hosts, and is there any method to set a EC pool cluster
> > in a better way
>
> There's failure domain by OSD, which Janne knows far better
This depends on which point in the procedure you refer to. He explicitly wrote
> Note, we have not deployed the new OSD jet.
meaning he observed misplaced objects before deploying the new disk. This
should not happen.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, r
I'm using
osd_op_queue = wpq
osd_op_queue_cut_off = high
and these settings are recommended.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Francois Legrand
Sent: 26 June 2020 09:44:00
To: Frank Schilder; ceph-
I'm running EC 8+2 with 'failure domain OSD' on a 3 node cluster with 24
OSDs. Until one has 10s of nodes it pretty much has to be failure domain
OSD.
The documentation lists certain other important settings which it took time
to find. Most important are recommendations to have a small replicat
I changed osd_op_queue_cut_off to high and rebooted all the osds. But
the result is more or less the same (storage is still extremely slow,
2h30 to rdb extract a 64GB image !). The only improvement is that it
seems that degraded pgs have disapeared (which is at least a good
point). It seems tha
Does somebody uses mclock in a production cluster ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
M=1 is never a good choice. Just use replication instead.
> On Jun 26, 2020, at 3:05 AM, Zhenshi Zhou wrote:
>
> Hi Janne,
>
> I use the default profile(2+1) and set failure-domain=host, is my best
> practice?
>
> Janne Johansson 于2020年6月26日周五 下午4:59写道:
>
>> Den fre 26 juni 2020 kl 10:3
Thanks. I also added osd_op_queue_cut_off to high in global (as you
mentioned in a previous thread that osd and mds should use it).
F.
Le 26/06/2020 à 16:35, Frank Schilder a écrit :
I never tried "prio" out, but the reports I have seen claim that prio is
inferior.
However, as far as I know i
Have you checked if the OSD keyring is present in /var/lib/ceph/osd/?
Compare the content to other OSDs that do restart successfully.
Zitat von "Naumann, Thomas" :
Hi,
in our production cluster (proxmox 5.4, ceph 12.2) there is an issue
since yesterday. after an increase of a pool 5 OSDs do
As others have pointed out; setting the failure domain to OSD is dangerous
because then all 6 chunks for an object can end up on the same host. 6 hosts
really seems like the minimum to mess with EC pools.
Adding a bucket type between host and osd seems like a good idea here, if you
absolutely
We're happy to announce the tenth release in the Nautilus series. In
addition to fixing a security-related bug in RGW, this release brings a
number of bugfixes across all major components of Ceph. We recommend
that all Nautilus users upgrade to this release. For a detailed
changelog please refer t
Hello!
Thanks for bringing this issue up, Victoria.
Ramana and David - we're using shaman to look up appropriate builds of packages
on chacra to test Ceph with OpenStack Cinder, Manila, Nova, and Glance in the
upstream OpenStack projects.
This LRC outage hit us - we're sorted for everything e
I never tried "prio" out, but the reports I have seen claim that prio is
inferior.
However, as far as I know it is safe to change these settings. Unfortunately,
you need to restart services to apply the changes.
Before you do, check if *all* daemons are using the same setting. Contrary to
the
Hi all,
I have a question regarding pointer variables used in the __crush_do_rule__
function of CRUSH __mapper.c__. Can someone please help me understand the
purpose of following four pointer variables inside __crush_do_rule__:
int *b = a + result_max;
int *c = b + result_max;
int *w = a;
int *o
Can anyone explain why ktdryer's dev repo is still landing on production
installs?
[root@centos8 ~]# ./cephadm add-repo --release octopus
INFO:root:Writing repo to /etc/yum.repos.d/ceph.repo...
INFO:cephadm:Enabling EPEL...
INFO:cephadm:Enabling supplementary copr repo ktdreyer/ceph-el8...
[root@
28 matches
Mail list logo