You may want to post this to the ceph mailing list as well.
On Mon, Oct 7, 2013 at 8:59 PM, Indra Pramana <in...@sg.or.id> wrote: > Dear Wido and all, > > I performed some further tests last night: > > (1) CPU utilization of the KVM host while RBD snapshot running is still > shooting up high even after I set global setting: > concurrent.snapshots.threshold.perhost to 2. > > (2) Most of the concurrent snapshot processes will fail with either stuck > in "Creating" state, or "CreatedOnPrimary" error message. > > (3) I also have adjusted some other related global settings such as > backup.snapshot.wait and job.expire.minutes, without any luck. > > Any advise on the reason what causes the high CPU utilization is greatly > appreciated. > > Looking forward to your reply, thank you. > > Cheers. > > > On Mon, Oct 7, 2013 at 11:03 PM, Indra Pramana <in...@sg.or.id> wrote: > >> Dear all, >> >> I also found out that when the RBD snapshot is being run, the CPU >> utilisation on the KVM host will be shooting up very high, which might >> explain why the host becomes disconnected. >> >> top - 22:49:32 up 3 days, 19:31, 1 user, load average: 7.85, 4.97, 3.47 >> Tasks: 297 total, 3 running, 294 sleeping, 0 stopped, 0 zombie >> Cpu(s): 4.5%us, 1.2%sy, 0.0%ni, 94.1%id, 0.1%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 264125244k total, 77203460k used, 186921784k free, 154888k buffers >> Swap: 545788k total, 0k used, 545788k free, 60677092k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 18161 root 20 0 3871m 31m 8444 S 101 0.0 301:58.09 kvm >> 2790 root 20 0 43.5g 1.6g 19m S 97 0.7 45:52.42 jsvc >> 24544 root 20 0 4583m 31m 8364 S 97 0.0 425:29.48 kvm >> 6537 root 20 0 0 0 0 R 71 0.0 0:17.49 kworker/3:2 >> 22546 root 20 0 6143m 2.0g 8452 S 26 0.8 55:14.07 kvm >> 4219 root 20 0 7671m 4.0g 8524 S 6 1.6 106:12.26 kvm >> 5989 root 20 0 43.2g 1.6g 232 D 6 0.6 0:08.13 jsvc >> 5993 root 20 0 43.3g 1.6g 224 D 6 0.6 0:08.36 jsvc >> >> Is it normal when snapshot is being run on the VM running on that host, >> the host's CPU utilisation will be higher than usual? How can I limit the >> CPU resources used by the snapshot? >> >> >> Looking forward to your reply, thank you. >> >> Cheers. >> >> >> >> On Mon, Oct 7, 2013 at 7:18 PM, Indra Pramana <in...@sg.or.id> wrote: >> >>> Dear all, >>> >>> I did some tests on snapshots since it's now supported for my Ceph RBD >>> primary storage in CloudStack 4.2. When I ran the snapshot for a particular >>> VM instance earlier, I noticed that this has caused the host (where the VM >>> is on) becomes disconnected. >>> >>> Here's the excerpt from the agent.log: >>> >>> http://pastebin.com/dxVV7stu >>> >>> The management-server.log doesn't much showing anything other than >>> detecting that the host was down and HA is being activated: >>> >>> http://pastebin.com/UeLiSm9K >>> >>> Anyone can advise what is causing the problem? So far there is only one >>> user doing the snapshotting and it has caused issues to the host, I can't >>> imagine what if multiple users try to do snapshotting at the same time? >>> >>> I read about snapshot job throttling which is described on the manual: >>> >>> >>> http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/working-with-snapshots.html >>> >>> But I am not too sure whether this will help to resolve the problem since >>> there is only one user trying to perform snapshot and we already encounter >>> the problem already. >>> >>> Anyone can advise how I can troubleshoot further and find a solution to >>> the problem? >>> >>> Looking forward to your reply, thank you. >>> >>> Cheers. >>> >> >>