On 22/09/15 16:20, Daniel P. Berrange wrote:
On Tue, Sep 22, 2015 at 09:05:11AM -0600, Chris Friesen wrote:On 09/21/2015 02:56 AM, Daniel P. Berrange wrote:On Fri, Sep 18, 2015 at 05:47:31PM +0000, Carlton, Paul (Cloud Services) wrote:However the most significant impediment we encountered was customer complaints about performance of instances during migration. We did a little bit of work to identify the cause of this and concluded that the main issues was disk i/o contention. I wonder if this is something you or others have encountered? I'd be interested in any idea for managing the rate of the migration processing to prevent it from adversely impacting the customer application performance. I appreciate that if we throttle the migration processing it will take longer and may not be able to keep up with the rate of disk/memory change in the instance.I would not expect live migration to have an impact on disk I/O, unless your storage is network based and using the same network as the migration data. While migration is taking place you'll see a small impact on the guest compute performance, due to page table dirty bitmap tracking, but that shouldn't appear directly as disk I/O problem. There is no throttling of guest I/O at all during migration.Technically if you're doing a lot of disk I/O couldn't you end up with a case where you're thrashing the page cache enough to interfere with migration? So it's actually memory change that is the problem, but it might not be memory that the application is modifying directly but rather memory allocated by the kernel.Could you point me at somewhere I can get details of the tuneable setting relating to cutover down time please? I'm assuming that at these are libvirt/qemu settings? I'd like to play with them in our test environment to see if we can simulate busy instances and determine what works. I'd also be happy to do some work to expose these in nova so the cloud operator can tweak if necessary?It is already exposed as 'live_migration_downtime' along with live_migration_downtime_steps, and live_migration_downtime_delay. Again, it shouldn't have any impact on guest performance while live migration is taking place. It only comes into effect when checking whether the guest is ready to switch to the new host.Has anyone given thought to exposing some of these new parameters to the end-user? I could see a scenario where an image might want to specify the acceptable downtime over migration. (On the other hand that might be tricky from the operator perspective.)I'm of the opinion that we should really try to avoid exposing *any* migration tunables to the tenant user. All the tunables are pretty hypervisor specific and low level and not very friendly to expose to tenants. Instead our focus should be on ensuring that it will always "just work" from the tenants POV. When QEMU gets 'post copy' migration working, we'll want to adopt that asap, as that will give us the means to guarantee that migration will always complete with very little need for tuning. At most I could see the users being able to given some high level indication as to whether their images tolerate some level of latency, so Nova can decide what migration characteristic is acceptable. Regards, Daniel
Actually I was not envisaging the controls on migration tuning being made available to the user. I was thinking we should provide the cloud administrator with the facility to increase the live_migration_downtime setting to increase the chance of a migration being able to complete. I would expect that this would be used in consultation with the instance owner. It seems to me, it might be a viable alternative to pausing the instance to allow the migration to complete. -- Paul Carlton Software Engineer Cloud Services Hewlett Packard BUK03:T242 Longdown Avenue Stoke Gifford Bristol BS34 8QZ Mobile: +44 (0)7768 994283 Email: mailto:[email protected] Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL".
smime.p7s
Description: S/MIME Cryptographic Signature
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
