I run high-availability squid servers on virtual machines although not yet in 
OpenNebula.
It can be done with very high availability.
I am not familiar with Ubuntu Server 12.04 but if it has libvirt 0.9.7 or 
better, and you are
Using KVM hypervisor, you should be able to use the cpu-pinning and numa-aware 
features of libvirt to pin
each virtual machine to a given physical cpu.   That will beat the migration 
issue you are seeing now.
With Xen hypervisor you can (and should) also pin.
I think if you beat the cpu and memory pinning problem you will be OK.

However, you did not say what network topology you are using for your virtual 
machine, and what kind of virtual network drivers,
That is important too.    Also-is your squid cache mostly disk-resident or 
mostly RAM-resident?  If the former then the virtual disk drivers matter too, a 
lot.

Steve Timm



From: users-boun...@lists.opennebula.org 
[mailto:users-boun...@lists.opennebula.org] On Behalf Of Erico Augusto 
Cavalcanti Guedes
Sent: Saturday, August 25, 2012 6:33 PM
To: users@lists.opennebula.org
Subject: [one-users] Very high unavailable service

Dears,

I 'm running Squid Web Cache Proxy server on Ubuntu Server 12.04 VMs (kernel 
3.2.0-23-generic-pae), OpenNebula 3.4.
My private cloud is composed by one frontend and three nodes. VMs are running 
on that 3 nodes, initially one by node.
Outside cloud, there are 2 hosts, one working as web clients and another as web 
server, using Web Polygraph Benchmakring Tool.

The goal of tests is stress Squid cache running on VMs.
When same test is executed outside the cloud, using the three nodes as Physical 
Machines, there are 100% of cache service availability.
Nevertheless, when cache service is provided by VMs, nothing better than 45% of 
service availability is reached.
Web clients do not receive responses from squid when it is running on VMs in 
55% of the time.

I have monitored load average of VMs and PMs where VMs are been executed. First 
load average field reaches 15 after some hours of tests on VMs, and 3 on 
physical machines.
Furthermore, there is a set of processes, called migration/X, that are 
champions in CPU TIME when VMs are in execution. A sample:

top - 20:01:38 up 1 day,  3:36,  1 user,  load average: 5.50, 5.47, 4.20

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    TIME COMMAND
   13 root      RT   0     0    0    0 S    0  0.0 408:27.25 408:27 migration/2
    8 root      RT   0     0    0    0 S    0  0.0 404:13.63 404:13 migration/1
    6 root      RT   0     0    0    0 S    0  0.0 401:36.78 401:36 migration/0
   17 root      RT   0     0    0    0 S    0  0.0 400:59.10 400:59 migration/3


It isn't possible to offer web cache service via VMs in the way the service is 
behaving, with so small availability.

So, my questions:

1. Does anybody has experienced a similar problem of unresponsive service? 
(Whatever service).
2. How to state the bootleneck that is overloading the system, so that it can 
be minimized?

Thanks a lot,

Erico.
_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to