I adjusted the crush map, everything's OK now. Thanks for your help!
On Wed, 23 Mar 2016 at 23:13 Matt Conner wrote:
> Hi Zhang,
>
> In a 2 copy pool, each placement group is spread across 2 OSDs - that is
> why you see such a high number of placement groups per OSD. There is a PG
> calculator a
---
> *From:* Zhang Qiang [dotslash...@gmail.com]
> *Sent:* 23 March 2016 23:17
> *To:* Goncalo Borges
> *Cc:* Oliver Dzombic; ceph-users
> *Subject:* Re: [ceph-users] Need help for PG problem
>
> And here's the osd tree if it matters.
>
> ID WEIGHT TYPE NA
It seems that you only have two host in your crush map. But the default
ruleset would separate the object by host.
If you set size 3 for pools, then there would be one object can't build
because you only have two hosts.
2016-03-23 20:17 GMT+08:00 Zhang Qiang :
> And here's the osd tree if it ma
From: Zhang Qiang [dotslash...@gmail.com]
Sent: 23 March 2016 23:17
To: Goncalo Borges
Cc: Oliver Dzombic; ceph-users
Subject: Re: [ceph-users] Need help for PG problem
And here's the osd tree if it matters.
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINI
Hi Zhang,
In a 2 copy pool, each placement group is spread across 2 OSDs - that is
why you see such a high number of placement groups per OSD. There is a PG
calculator at http://ceph.com/pgcalc/. Based on your setup, it may be worth
using 2048 instead of 4096.
As for stuck/degraded PGs, most are
Are you runnig with the default failure domain of 'host'?
If so, with a pool size of 3 and your 20 OSDs physically only on 2 hosts
Ceph is unable to find a 3rd host to map the 3rd replica.
Either add a host and move some OSDs there or reduce pool size to 2.
-K.
On 03/23/2016 02:17 PM, Zhang Qia
You should have settled with the nearest power of 2, which for 666 is
512. Since you created the cluster and IIRC is a testbed, you may as
well recreate it again, however it will less of a hassle to just
increase the pgs to the next power of two: 1024
Your 20 ods appear to be equal sized in your c
And here's the osd tree if it matters.
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 22.39984 root default
-2 21.39984 host 10
0 1.06999 osd.0up 1.0 1.0
1 1.06999 osd.1up 1.0 1.0
2 1.06999 osd.
Oliver, Goncalo,
Sorry to disturb again, but recreating the pool with a smaller pg_num
didn't seem to work, now all 666 pgs are degraded + undersized.
New status:
cluster d2a69513-ad8e-4b25-8f10-69c4041d624d
health HEALTH_WARN
666 pgs degraded
82 pgs stuck unclean
Hello Gonçalo,
Thanks for your reminding. I was just setting up the cluster for test, so don't
worry, I can just remove the pool. And I learnt that since the replication
number and pool number are related to pg_num, I'll consider them carefully
before deploying any data.
> On Mar 23, 2016, at
Hi Zhang,
From the ceph health detail, I suggest NTP server should be calibrated.
Can you share crush map output?
2016-03-22 18:28 GMT+08:00 Zhang Qiang :
> Hi Reddy,
> It's over a thousand lines, I pasted it on gist:
> https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4
>
> On Tue,
Hi Zhang...
If I can add some more info, the change of PGs is a heavy operation, and as far
as i know, you should NEVER decrease PGs. From the notes in pgcalc
(http://ceph.com/pgcalc/):
"It's also important to know that the PG count can be increased, but NEVER
decreased without destroying / re
I got it, the pg_num suggested is the total, I need to divide it by the
number of replications.
Thanks Oliver, your answer is very thorough and helpful!
On 23 March 2016 at 02:19, Oliver Dzombic wrote:
> Hi Zhang,
>
> yeah i saw your answer already.
>
> At very first, you should make sure that
Hi Zhang,
yeah i saw your answer already.
At very first, you should make sure that there is no clock skew.
This can cause some sideeffects.
According to
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
you have to:
(OSDs * 100)
Total PGs =
Hi Reddy,
It's over a thousand lines, I pasted it on gist:
https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4
On Tue, 22 Mar 2016 at 18:15 M Ranga Swami Reddy
wrote:
> Hi,
> Can you please share the "ceph health detail" output?
>
> Thanks
> Swami
>
> On Tue, Mar 22, 2016 at 3:32 PM, Zhang Q
Hi Zhang,
are you sure, that all your 20 osd's are up and in ?
Please provide the complete output of ceph -s or better with detail flag.
Thank you :-)
--
Mit freundlichen Gruessen / Best regards
Oliver Dzombic
IP-Interactive
mailto:i...@ip-interactive.de
Anschrift:
IP Interactive UG ( haft
Hi,
Can you please share the "ceph health detail" output?
Thanks
Swami
On Tue, Mar 22, 2016 at 3:32 PM, Zhang Qiang wrote:
> Hi all,
>
> I have 20 OSDs and 1 pool, and, as recommended by the
> doc(http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I
> configured pg_num and pgp
Hi all,
I have 20 OSDs and 1 pool, and, as recommended by the doc(
http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I
configured pg_num and pgp_num to 4096, size 2, min size 1.
But ceph -s shows:
HEALTH_WARN
534 pgs degraded
551 pgs stuck unclean
534 pgs undersized
too many
18 matches
Mail list logo