On 05/23/2016 08:46 PM, John Griffith wrote:


On Mon, May 23, 2016 at 8:32 AM, Ivan Kolodyazhny <[email protected]
<mailto:[email protected]>> wrote:

    Hi developers and operators,
    I would like to get any feedback from you about my idea before I'll start
    work on spec.

    In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
    of instance builds to run concurrently' per each compute. There is no
    equivalent Cinder.

    Why do we need it for Cinder? IMO, it could help us to address following 
issues:

      * Creation of N volumes at the same time increases a lot of resource usage
        by cinder-volume service. Image caching feature [2] could help us a bit
        in case when we create volume form image. But we still have to upload N
        images to the volumes backend at the same time.
      * Deletion on N volumes at parallel. Usually, it's not very hard task for
        Cinder, but if you have to delete 100+ volumes at once, you can fit
        different issues with DB connections, CPU and memory usages. In case of
        LVM, it also could use 'dd' command to cleanup volumes.
      * It will be some kind of load balancing in HA mode: if cinder-volume
        process is busy with current operations, it will not catch message from
        RabbitMQ and other cinder-volume service will do it.
      *  From users perspective, it seems that better way is to create/delete N
        volumes a bit slower than fail after X volumes were created/deleted.


    [1]
    
https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
    [2]
    
https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html

    Regards,
    Ivan Kolodyazhny,
    http://blog.e0ne.info/

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe: [email protected]?subject:unsubscribe
    <http://[email protected]?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

​Just curious about a couple things:  Is this attempting to solve a problem in
the actual Cinder Volume Service or is this trying to solve problems with
backends that can't keep up and deliver resources under heavy load?  I get the
copy-image to volume, that's a special case that certainly does impact Cinder
services and the Cinder node itself, but there's already throttling going on
there, at least in terms of IO allowed.

Also, I'm curious... would the exiting API Rate Limit configuration achieve the
same sort of thing you want to do here?  Granted it's not selective but maybe
it's worth mentioning.

If we did do something like this I would like to see it implemented as a driver
config; but that wouldn't help if the problem lies in the Rabbit or RPC space.
That brings me back to wondering about exactly where we want to solve problems
and exactly which.  If delete is causing problems like you describe I'd suspect
we have an issue in our DB code (too many calls to start with) and that we've
got some overhead elsewhere that should be eradicated.  Delete is a super simple
operation on the Cinder side of things (and most back ends) so I'm a bit freaked
out thinking that it's taxing resources heavily.

For what it's worth, with the LVM backend under heavy load we've run into cases where cinder-volume ends up being blocked by disk I/O for over a minute.

Now this was pretty much a worst-case, with cinder volumes on a single spinning disk. But the fact that IO cgroups don't work with LVM (this is a linux kernel limitation) means that it's difficult to ensure that the cinder process doesn't block indefinitely on disk IO.

Chris


_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to