On 09/21/2015 02:56 AM, Daniel P. Berrange wrote:
On Fri, Sep 18, 2015 at 05:47:31PM +0000, Carlton, Paul (Cloud Services) wrote:
However the most significant impediment we encountered was customer
complaints about performance of instances during migration.  We did a little
bit of work to identify the cause of this and concluded that the main issues
was disk i/o contention.  I wonder if this is something you or others have
encountered?  I'd be interested in any idea for managing the rate of the
migration processing to prevent it from adversely impacting the customer
application performance.  I appreciate that if we throttle the migration
processing it will take longer and may not be able to keep up with the rate
of disk/memory change in the instance.

I would not expect live migration to have an impact on disk I/O, unless
your storage is network based and using the same network as the migration
data. While migration is taking place you'll see a small impact on the
guest compute performance, due to page table dirty bitmap tracking, but
that shouldn't appear directly as disk I/O problem. There is no throttling
of guest I/O at all during migration.

Technically if you're doing a lot of disk I/O couldn't you end up with a case where you're thrashing the page cache enough to interfere with migration? So it's actually memory change that is the problem, but it might not be memory that the application is modifying directly but rather memory allocated by the kernel.

Could you point me at somewhere I can get details of the tuneable setting
relating to cutover down time please?  I'm assuming that at these are
libvirt/qemu settings?  I'd like to play with them in our test environment
to see if we can simulate busy instances and determine what works.  I'd also
be happy to do some work to expose these in nova so the cloud operator can
tweak if necessary?

It is already exposed as 'live_migration_downtime' along with
live_migration_downtime_steps, and live_migration_downtime_delay.
Again, it shouldn't have any impact on guest performance while
live migration is taking place. It only comes into effect when
checking whether the guest is ready to switch to the new host.

Has anyone given thought to exposing some of these new parameters to the end-user? I could see a scenario where an image might want to specify the acceptable downtime over migration. (On the other hand that might be tricky from the operator perspective.)

Chris

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to