Hello Alexander,
  One other point on your email.. You indicate you desire each OSD to have
~100 PGs, but depending on your pool size, it seems you may have forgetting
about the additional PGs associated with replication itself.

Assuming 3x replication in your environment:
70,000 * 3
------------
800 OSDs

=~ 262.5 PGs per OSD on average

While this PG to OSD ratio shouldn't cause significant pain, I would not go
to any higher PG count without adding more spindles.

For more specific PG count guidance and modeling, please see:
http://ceph.com/pgcalc

Hope this helps,

Michael J. Kidd
Sr. Storage Consultant
Red Hat Global Storage Consulting
+1 919-442-8878

On Wed, Sep 23, 2015 at 8:34 AM, Sage Weil <s...@newdream.net> wrote:

> On Wed, 23 Sep 2015, Alexander Yang wrote:
> > hello,
> >         We use Ceph+Openstack in our private cloud. In our cluster, we
> have
> > 5 mons and 800 osds, the Capacity is about 1Pb. And run about 700 vms and
> > 1100 volumes,
> >         recently, we increase our pg_num , now the cluster have about
> 70000
> > pgs. In my real intention? I want every osd have 100pgs. but after
> increase
> > pg_num, I find I'm wrong. Because the different crush weight for
> different
> > osd, the osd's pg_num is different, some osd have exceed  500pgs.
> >         Now, the problem is  appear?cause some reason when i want to
> change
> > some osd  weight, that means change the crushmap.  This change cause
> about
> > 0.03% data to migrate. the mon is always begin to election. It's will
> hung
> > the cluster, and when they end, the  original  leader still is the new
> > leader. And during the mon eclection?On the upper layer, vm have too many
> > slow request will appear. so now i dare to do any operation about change
> > crushmap. But i worry about an important thing, If  when our cluster
> down
> >  one host even down one rack.   By the time, the cluster curshmap will
> > change large, and the migrate data also large. I worry the cluster will
> > hung  long time. and result on upper layer, all vm became to  shutdown.
> >         In my opinion, I guess when I change the crushmap,* the leader
> mon
> > maybe calculate the too many information*, or* too many client want to
> get
> > the new crushmap from leader mon*.  It must be hung the mon thread, so
> the
> > leader mon can't heatbeat to other mons, the other mons think the leader
> is
> > down then begin the new election.  I am sorry if i guess is wrong.
> >         The crushmap in accessory. So who can give me some advice or
> guide,
> > Thanks very much!
>
> There were huge improvements made in hammer in terms of mon efficiency in
> these cases where it is under load.  I recommend upgrading as that will
> help.
>
> You can also mitigate the problem somewhat by adjusting the mon_lease and
> associated settings up.  Scale all of mon_lease, mon_lease_renew_interval,
> mon_lease_ack_timeout, mon_accept_timeout by 2x or 3x.
>
> It also sounds like you may be using some older tunables/settings
> for your pools or crush rules.  Can you attach the output of 'ceph osd
> dump' and 'ceph osd crush dump | tail -n 20' ?
>
> sage
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to