1 You have 3 size pool I do not know why you set min_size 1. It is too dangous.
2 You had better use the same size and same num osds each host for crush.
now you can try ceph osd reweight-by-utilization. command. When there is no
user in you cluster.
and I will go home.
At 2017-07-28 17
On Fri, Jul 28, 2017 at 05:52:29PM +0800, linghucongsong wrote:
>
>
>
> You have two crush rule? One is ssd the other is hdd?
yes, exactly..
>
> Can you show ceph osd dump|grep pool
>
pool 3 'vm' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 1024 pgp_num 1024 last
You have two crush rule? One is ssd the other is hdd?
Can you show ceph osd dump|grep pool
ceph osd crush dump
At 2017-07-28 17:47:48, "Nikola Ciprich" wrote:
>
>On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote:
>>
>>
>> It look like the osd in your cluster is not all th
On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote:
>
>
> It look like the osd in your cluster is not all the same size.
>
> can you show ceph osd df output?
you're right, they're not.. here's the output:
[root@v1b ~]# ceph osd df tree
ID WEIGHT REWEIGHT SIZE USE AVAIL %U
It look like the osd in your cluster is not all the same size.
can you show ceph osd df output?
At 2017-07-28 17:24:29, "Nikola Ciprich" wrote:
>I forgot to add that OSD daemons really seem to be idle, no disk
>activity, no CPU usage.. it just looks to me like some kind of
>deadlock, as they
I forgot to add that OSD daemons really seem to be idle, no disk
activity, no CPU usage.. it just looks to me like some kind of
deadlock, as they were waiting for each other..
and so I'm trying to get last 1.5% of misplaced / degraded PGs
for almost a week..
On Fri, Jul 28, 2017 at 10:56:02AM +
Hi,
I'm trying to find reason for strange recovery issues I'm seeing on
our cluster..
it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
across nodes. jewel 10.2.9
the problem is that after some disk replaces and data moves, recovery
is progressing extremely slowly.. pgs seem to be