On 03/02/16 10:49, Daniel P. Berrange wrote:
On Wed, Feb 03, 2016 at 10:44:36AM +0000, Daniel P. Berrange wrote:
On Wed, Feb 03, 2016 at 10:37:24AM +0000, Koniszewski, Pawel wrote:
Hello everyone,
On the yesterday's live migration meeting we had concerns that interval of
writing migration progress to the database is too short.
Information about migration progress will be stored in the database and
exposed through the API (/servers/<uuid>/migrations/<id>). In current
proposition [1] migration progress will be updated every 2 seconds. It
basically means that every 2 seconds a call through RPC will go from compute
to conductor to write migration data to the database. In case of parallel
live migrations each migration will report progress by itself.
Isn't 2 seconds interval too short for updates if the information is exposed
through the API and it requires RPC and DB call to actually save it in the
DB?
Our default configuration allows only for 1 concurrent live migration [2],
but it might vary between different deployments and use cases as it is
configurable. Someone might want to trigger 10 (or even more) parallel live
migrations and each might take even a day to finish in case of block
migration. Also if deployment is big enough rabbitmq might be fully-loaded.
I'm not sure whether updating each migration every 2 seconds makes sense in
this case. On the other hand it might be hard to observe fast enough that
migration is stuck if we increase this interval...
Do we have any actual data that this is a real problem. I have a pretty hard
time believing that a database update of a single field every 2 seconds is
going to be what pushes Nova over the edge into a performance collapse, even
if there are 20 migrations running in parallel, when you compare it to the
amount of DB queries & updates done across other areas of the code for pretty
much every singke API call and background job.
Also note that progress is rounded to the nearest integer. So even if the
migration runs all day, there is a maximum of 100 possible changes in value
for the progress field, so most of the updates should turn in to no-ops at
the database level.
Regards,
Daniel
I agree with Daniel, these rpc and db access ops are a tiny percentage
of the overall load on rabbit and mysql and properly configured these
subsystems should have no issues with this workload.
One correction, unless I'm misreading it, the existing
_live_migration_monitor code updates the progress field of the instance
record every 5 seconds. However this value can go up and down so
an infinate number of updates are possible?
However, the issue raised here is not with the existing implementation
but with the proposed change
https://review.openstack.org/#/c/258813/5/nova/virt/libvirt/driver.py
This add a save() operation on the migration object every 2 seconds
Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard Enterprise
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ
Mobile: +44 (0)7768 994283
Office: +44 (0)117 316 2189
Email: mailto:paul.carlt...@hpe.com
irc: paul-carlton2
Hewlett-Packard Enterprise Limited registered Office: Cain Road, Bracknell,
Berks RG12 1HN Registered No: 690597 England.
The contents of this message and any attachments to it are confidential and may be
legally privileged. If you have received this message in error, you should delete it from
your system immediately and advise the sender. To any recipient of this message within
HP, unless otherwise stated you should consider this message and attachments as "HP
CONFIDENTIAL".
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev