I always keep my pg number a power of 2. So I’d go from 2048 to 4096. I’m not sure if this is the safest way, but it’s worked for me.
[yp] Michael Kuriger Sr. Unix Systems Engineer • mk7...@yp.com<mailto:mk7...@yp.com> |• 818-649-7235 From: Chu Duc Minh <chu.ducm...@gmail.com<mailto:chu.ducm...@gmail.com>> Date: Monday, March 16, 2015 at 7:49 AM To: Florent B <flor...@coppint.com<mailto:flor...@coppint.com>> Cc: "ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down ! I'm using the latest Giant and have the same issue. When i increase PG_num of a pool from 2048 to 2148, my VMs is still ok. When i increase from 2148 to 2400, some VMs die (Qemu-kvm process die). My physical servers (host VMs) running kernel 3.13 and use librbd. I think it's a bug in librbd with crushmap. (I set crush_tunables3 on my ceph cluster, does it make sense?) Do you know a way to safely increase PG_num? (I don't think increase PG_num 100 each times is a safe & good way) Regards, On Mon, Mar 16, 2015 at 8:50 PM, Florent B <flor...@coppint.com<mailto:flor...@coppint.com>> wrote: We are on Giant. On 03/16/2015 02:03 PM, Azad Aliyar wrote: > > May I know your ceph version.?. The latest version of firefly 80.9 has > patches to avoid excessive data migrations during rewighting osds. You > may need set a tunable inorder make this patch active. > > This is a bugfix release for firefly. It fixes a performance regression > in librbd, an important CRUSH misbehavior (see below), and several RGW > bugs. We have also backported support for flock/fcntl locks to ceph-fuse > and libcephfs. > > We recommend that all Firefly users upgrade. > > For more detailed information, see > > http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_-5Fdownloads_v0.80.9.txt&d=AwMFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=0MEOMMXqQGLq4weFd85B2Bxn5uBH9V9uMiuajNVb7o0&s=-HHkWm2cMQZ06FKpWF4Ai-YkFb9lUR_tH_KR0eITbuU&e=> > > Adjusting CRUSH maps > -------------------- > > * This point release fixes several issues with CRUSH that trigger > excessive data migration when adjusting OSD weights. These are most > obvious when a very small weight change (e.g., a change from 0 to > .01) triggers a large amount of movement, but the same set of bugs > can also lead to excessive (though less noticeable) movement in > other cases. > > However, because the bug may already have affected your cluster, > fixing it may trigger movement *back* to the more correct location. > For this reason, you must manually opt-in to the fixed behavior. > > In order to set the new tunable to correct the behavior:: > > ceph osd crush set-tunable straw_calc_version 1 > > Note that this change will have no immediate effect. However, from > this point forward, any 'straw' bucket in your CRUSH map that is > adjusted will get non-buggy internal weights, and that transition > may trigger some rebalancing. > > You can estimate how much rebalancing will eventually be necessary > on your cluster with:: > > ceph osd getcrushmap -o /tmp/cm > crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a 2>&1 > crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 > crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 > crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b > 2>&1 > wc -l /tmp/a # num total mappings > diff -u /tmp/a /tmp/b | grep -c ^+ # num changed mappings > > Divide the total number of lines in /tmp/a with the number of lines > changed. We've found that most clusters are under 10%. > > You can force all of this rebalancing to happen at once with:: > > ceph osd crush reweight-all > > Otherwise, it will happen at some unknown point in the future when > CRUSH weights are next adjusted. > > Notable Changes > --------------- > > * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) > * crush: fix straw bucket weight calculation, add straw_calc_version > tunable (#10095 Sage Weil) > * crush: fix tree bucket (Rongzu Zhu) > * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) > * crushtool: add --reweight (Sage Weil) > * librbd: complete pending operations before losing image (#10299 Jason > Dillaman) > * librbd: fix read caching performance regression (#9854 Jason Dillaman) > * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) > * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) > * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) > * osd: handle no-op write with snapshot (#10262 Sage Weil) > * radosgw-admi > > > > > On 03/16/2015 12:37 PM, Alexandre DERUMIER wrote: > >>> VMs are running on the same nodes than OSD > > Are you sure that you didn't some kind of out of memory. > > pg rebalance can be memory hungry. (depend how many osd you have). > > 2 OSD per host, and 5 hosts in this cluster. > hosts h > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=AwMFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=0MEOMMXqQGLq4weFd85B2Bxn5uBH9V9uMiuajNVb7o0&s=Ia5izwHCY5W52bW4JusE-wRH_UKmfX-03xvLZ2wMta0&e=>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com