On 25 February 2015 at 13:50, Eugene Nikanorov <[email protected]> wrote:
> Thanks for putting this all together, Salvatore. > > I just want to comment on this suggestion: > > 1) Move the allocation logic out of the driver, thus making IPAM an > independent service. The API workers will then communicate with the IPAM > service through a message bus, where IP allocation requests will be > "naturally serialized" > > Right now port creation is already a distributed process involving several > parties. > Adding one more actor outside Neutron which can be communicated with > message bus just to serialize requests makes me think of how terrible > troubleshooting could be in case of applied load, when communication over > mq slows down or interrupts. > Not to say such service would be SPoF and a contention point. > So, this of course could be an option, but personally I'd not like to see > it as a default. > Basically here I'm just braindumping. I have no idea on whether this could be scalable, reliable or maintainable (see reply to Clint's post). I wish I could prototype code for this, but I'm terribly slow. The days were I was able to produce thousands of working LOCs per day are long gone. Anyway it is right that port creation is already a fairly complex workflow. However, IPAM will be anyway a synchronous operation within this workflow. Indeed if the IPAM process does not complete port wiring and securing in the agents cannot occur. So I do not expect it to add significant difficulties in troubleshooting, for which I might add that the issue is not really due to complex communication patterns, but to the fact that Neutron still does not have a decent mechanism to correlate events occurring on the server and in the agents, thus forcing developers and operators to read logs as if they were hieroglyphics. > > Thanks, > Eugene. > > On Wed, Feb 25, 2015 at 4:35 AM, Robert Collins <[email protected] > > wrote: > >> On 24 February 2015 at 01:07, Salvatore Orlando <[email protected]> >> wrote: >> > Lazy-Stacker summary: >> ... >> > In the medium term, there are a few things we might consider for >> Neutron's >> > "built-in IPAM". >> > 1) Move the allocation logic out of the driver, thus making IPAM an >> > independent service. The API workers will then communicate with the IPAM >> > service through a message bus, where IP allocation requests will be >> > "naturally serialized" >> > 2) Use 3-party software as dogpile, zookeeper but even memcached to >> > implement distributed coordination. I have nothing against it, and I >> reckon >> > Neutron can only benefit for it (in case you're considering of arguing >> that >> > "it does not scale", please also provide solid arguments to support your >> > claim!). Nevertheless, I do believe API request processing should >> proceed >> > undisturbed as much as possible. If processing an API requests requires >> > distributed coordination among several components then it probably means >> > that an asynchronous paradigm is more suitable for that API request. >> >> So data is great. It sounds like as long as we have an appropriate >> retry decorator in place, that write locks are better here, at least >> for up to 30 threads. But can we trust the data? >> >> One thing I'm not clear on is the SQL statement count. You say 100 >> queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per >> allocation ? So is that 2 queries over 50 allocations over 20 threads? >> >> I'm not clear on what the request parameter in the test json files >> does, and AFAICT your threads each do one request each. As such I >> suspect that you may be seeing less concurrency - and thus contention >> - than real-world setups where APIs are deployed to run worker >> processes in separate processes and requests are coming in >> willy-nilly. The size of each algorithms workload is so small that its >> feasible to imagine the thread completing before the GIL bytecount >> code trigger (see >> https://docs.python.org/2/library/sys.html#sys.setcheckinterval) and >> the GIL's lack of fairness would exacerbate that. >> >> If I may suggest: >> - use multiprocessing or some other worker-pool approach rather than >> threads >> - or set setcheckinterval down low (e.g. to 20 or something) >> - do multiple units of work (in separate transactions) within each >> worker, aim for e.g. 10 seconds or work or some such. >> - log with enough detail that we can report on the actual concurrency >> achieved. E.g. log the time in us when each transaction starts and >> finishes, then we can assess how many concurrent requests were >> actually running. >> >> If the results are still the same - great, full steam ahead. If not, >> well lets revisit :) >> >> -Rob >> >> >> -- >> Robert Collins <[email protected]> >> Distinguished Technologist >> HP Converged Cloud >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> [email protected]?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
