Re: [openstack-dev] Moving task flow to conductor - concern about scale

Joshua Harlow Sat, 20 Jul 2013 19:37:39 -0700

Looking at the conductor code it still to me provides a low level database API 
that succumbs to the same races as a the old db access did. Get calls followed 
by some response followed by some python code followed by some rpc update 
followed by more code is still susceptible to consistency & fragility issues.

The API provided is more data oriented and not action oriented. I would argue 
that data oriented leads to lots of consistency issues with multiple 
conductors. Action/task oriented if that is ever accomplished allows the 
conductor to lock resources that are being "manipulated" so that another 
conductor can not alter the same resource at the same time.

Nova currently has a lot of devoted and hard to follow logic for when resources 
are simultaneously manipulated (deleted while building for example). Just look 
for *not found* exceptions being thrown in the conductor from *get/update 
function calls and check where that exception is handled (are all of them? are 
all resources cleaned up??). These seem like examples of a API that is to low 
level and wouldn't be exposed in a action/task oriented API. It appears that 
nova is trying to handle all of these special exists or not already exists (or 
similar consistency violations) calls correctly, which is good, but having said 
logic scattered sure doesn't inspire confidence that it is correctly doing the 
right logic under all scenarios to me. Does that not worry anyone else??

IMHO adding task logic in the conductor on top of the already hard to follow 
logic for these scenarios worries me personally. That's why I previously 
thought (and others seem to think) task logic and correct locking and such ... 
should be located in a service that can devote its code to just doing said 
tasks reliably. Honestly said code will be much much more complex than a 
database-rpc access layer (especially when the races and simultaneous 
manipulation problems are not hidden/scattered but are dealt with in an upfront 
and easily auditable manner).

But maybe this is nothing new to folks and all of this is already being thought 
about (solutions do seem to be appearing and more discussion about said ideas 
is always beneficial).

Just my thoughts...

Sent from my really tiny device...

On Jul 19, 2013, at 5:30 PM, "Peter Feiner" <pe...@gridcentric.ca> wrote:

> On Fri, Jul 19, 2013 at 4:36 PM, Joshua Harlow <harlo...@yahoo-inc.com> wrote:
>> This seems to me to be a good example where a library "problem" is leaking 
>> into the openstack architecture right? That is IMHO a bad path to go down.
>> 
>> I like to think of a world where this isn't a problem and design the correct 
>> solution there instead and fix the eventlet problem instead. Other large 
>> applications don't fallback to rpc calls to get around a database/eventlet 
>> scaling issues afaik.
>> 
>> Honestly I would almost just want to finally fix the eventlet problem (chris 
>> b. I think has been working on it) and design a system that doesn't try to 
>> work around a libraries lacking. But maybe that's to much idealism, idk...
> 
> Well, there are two problems that multiple nova-conductor processes
> fix. One is the bad interaction between eventlet and native code. The
> other is allowing multiprocessing.  That is, once nova-conductor
> starts to handle enough requests, enough time will be spent holding
> the GIL to make it a bottleneck; in fact I've had to scale keystone
> using multiple processes because of GIL contention (i.e., keystone was
> steadily at 100% CPU utilization when I was hitting OpenStack with
> enough requests). So multiple processes isn't avoidable. Indeed, other
> software that strives for high concurrency, such as apache, use
> multiple processes to avoid contention for per-process kernel
> resources like the mmap semaphore.
> 
>> This doesn't even touch on the synchronization issues that can happen when u 
>> start pumping db traffic over a mq. Ex, an update is now queued behind 
>> another update, the second one conflicts with the first, where does 
>> resolution happen when an async mq call is used. What about when you have X 
>> conductors doing Y reads and Z updates; I don't even want to think about the 
>> sync/races there (and so on...). Did u hit / check for any consistency 
>> issues in your tests? Consistency issues under high load using multiple 
>> conductors scare the bejezzus out of me....
> 
> If a sequence of updates needs to be atomic, then they should be made
> in the same database transaction. Hence nova-conductor's interface
> isn't do_some_sql(query), it's a bunch of high-level nova operations
> that are implemented using transactions.
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Moving task flow to conductor - concern about scale

Reply via email to