You may want to post this to the ceph mailing list as well.

On Mon, Oct 7, 2013 at 8:59 PM, Indra Pramana <in...@sg.or.id> wrote:
> Dear Wido and all,
>
> I performed some further tests last night:
>
> (1) CPU utilization of the KVM host while RBD snapshot running is still
> shooting up high even after I set global setting:
> concurrent.snapshots.threshold.perhost to 2.
>
> (2) Most of the concurrent snapshot processes will fail with either stuck
> in "Creating" state, or "CreatedOnPrimary" error message.
>
> (3) I also have adjusted some other related global settings such as
> backup.snapshot.wait and job.expire.minutes, without any luck.
>
> Any advise on the reason what causes the high CPU utilization is greatly
> appreciated.
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
> On Mon, Oct 7, 2013 at 11:03 PM, Indra Pramana <in...@sg.or.id> wrote:
>
>> Dear all,
>>
>> I also found out that when the RBD snapshot is being run, the CPU
>> utilisation on the KVM host will be shooting up very high, which might
>> explain why the host becomes disconnected.
>>
>> top - 22:49:32 up 3 days, 19:31,  1 user,  load average: 7.85, 4.97, 3.47
>> Tasks: 297 total,   3 running, 294 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  4.5%us,  1.2%sy,  0.0%ni, 94.1%id,  0.1%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Mem:  264125244k total, 77203460k used, 186921784k free,   154888k buffers
>> Swap:   545788k total,        0k used,   545788k free, 60677092k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 18161 root      20   0 3871m  31m 8444 S  101  0.0 301:58.09 kvm
>>  2790 root      20   0 43.5g 1.6g  19m S   97  0.7  45:52.42 jsvc
>> 24544 root      20   0 4583m  31m 8364 S   97  0.0 425:29.48 kvm
>>  6537 root      20   0     0    0    0 R   71  0.0   0:17.49 kworker/3:2
>> 22546 root      20   0 6143m 2.0g 8452 S   26  0.8  55:14.07 kvm
>>  4219 root      20   0 7671m 4.0g 8524 S    6  1.6 106:12.26 kvm
>>  5989 root      20   0 43.2g 1.6g  232 D    6  0.6   0:08.13 jsvc
>>  5993 root      20   0 43.3g 1.6g  224 D    6  0.6   0:08.36 jsvc
>>
>> Is it normal when snapshot is being run on the VM running on that host,
>> the host's CPU utilisation will be higher than usual? How can I limit the
>> CPU resources used by the snapshot?
>>
>>
>> Looking forward to your reply, thank you.
>>
>> Cheers.
>>
>>
>>
>> On Mon, Oct 7, 2013 at 7:18 PM, Indra Pramana <in...@sg.or.id> wrote:
>>
>>> Dear all,
>>>
>>> I did some tests on snapshots since it's now supported for my Ceph RBD
>>> primary storage in CloudStack 4.2. When I ran the snapshot for a particular
>>> VM instance earlier, I noticed that this has caused the host (where the VM
>>> is on) becomes disconnected.
>>>
>>> Here's the excerpt from the agent.log:
>>>
>>> http://pastebin.com/dxVV7stu
>>>
>>> The management-server.log doesn't much showing anything other than
>>> detecting that the host was down and HA is being activated:
>>>
>>> http://pastebin.com/UeLiSm9K
>>>
>>> Anyone can advise what is causing the problem? So far there is only one
>>> user doing the snapshotting and it has caused issues to the host, I can't
>>> imagine what if multiple users try to do snapshotting at the same time?
>>>
>>> I read about snapshot job throttling which is described on the manual:
>>>
>>>
>>> http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/working-with-snapshots.html
>>>
>>> But I am not too sure whether this will help to resolve the problem since
>>> there is only one user trying to perform snapshot and we already encounter
>>> the problem already.
>>>
>>> Anyone can advise how I can troubleshoot further and find a solution to
>>> the problem?
>>>
>>> Looking forward to your reply, thank you.
>>>
>>> Cheers.
>>>
>>
>>

Reply via email to