> This is an attempt to summarize a really useful discussion that Victor, > Flavio and I have been having today. At the bottom are some background > links - basically what I have open in my browser right now thinking > through all of this.
Thanks for the detailed summary, it puts a more flesh on the bones than a brief conversation on the fringes of the Paris mid-cycle. Just a few clarifications and suggestions inline to add into the mix. > We're attempting to take baby-steps towards moving completely from > eventlet to asyncio/trollius. The thinking is for Ceilometer to be the > first victim. First beneficiary, I hope :) > Ceilometer's code is run in response to various I/O events like REST API > requests, RPC calls, notifications received, etc. We eventually want the > asyncio event loop to be what schedules Ceilometer's code in response to > these events. Right now, it is eventlet doing that. Yes. And there is one other class of stimulus, also related to eventlet, that is very important for triggering the execution of ceilometer logic. That would be the timed tasks that drive polling of: * REST APIs provided by other openstack services * the local hypervisor running on each compute node * the SNMP daemons running at host-level etc. and also trigger periodic alarm evaluation. IIUC these tasks are all mediated via the oslo threadgroup's usage of eventlet.greenpool[1]. Would this logic also be replaced as part of this effort? > Now, because we're using eventlet, the code that is run in response to > these events looks like synchronous code that makes a bunch of > synchronous calls. For example, the code might do some_sync_op() and > that will cause a context switch to a different greenthread (within the > same native thread) where we might handle another I/O event (like a REST > API request) Just to make the point that most of the agents in the ceilometer zoo tend to react to just a single type of stimulus, as opposed to a mix of dispatching from both message bus and the REST API. So to classify, we'd have: * compute-agent: timer tasks for polling * central-agent: timer tasks for polling * notification-agent: dispatch of "external" notifications from the message bus * collector: dispatch of "internal" metering messages from the message bus * api-service: dispatch of REST API calls * alarm-evaluator: timer tasks for alarm evaluation * alarm-notifier: dispatch of "internal" alarm notifications IIRC, the only case where there's a significant mix of trigger styles is the partitioned alarm evaluator, where assignments of alarm subsets for evaluation is driven over RPC, whereas the actual thresholding is triggered by a timer. > Porting from eventlet's implicit async approach to asyncio's explicit > async API will be seriously time consuming and we need to be able to do > it piece-by-piece. Yes, I agree, a step-wise approach is the key here. So I'd love to have some sense of the time horizon for this effort. It clearly feels like a multi-cycle effort, so the main question in my mind right now is whether we should be targeting the first deliverables for juno-3? That would provide a proof-point in advance of the K* summit, where I presume the task would be get wider buy-in for the idea. If it makes sense to go ahead and aim the first baby steps for juno-3, then we'd need to have a ceilometer-spec detailing these changes. This would need to be proposed by say EoW and then landed before the spec acceptance deadline for juno (~July 21st). We could use this spec proposal to dig into the perceived benefits of this effort: * the obvious win around getting rid of the eventlet black-magic * plus possibly other benefits such as code clarity and ease of maintenance and OTOH get a heads-up on the risks: * possible immaturity in the new framework? * overhead involved in contributors getting to grips with the new coroutine model > The question then becomes what do we need to do in order to port a > single oslo.messaging RPC endpoint method in Ceilometer to asyncio's > explicit async approach? One approach would be to select one well-defined area of ceilometer as an initial test-bed for these ideas. And one potential candidate for that would be the partitioned alarm evaluator, which uses: 1. fan-out RPC for the heartbeats underpinning master-slave coordination 2. RPC calls for alarm allocations and assignments I spoke to Cyril Roelandt at the mid-cycle, who is interested in: * replacing #1 with the tooz distributed co-ordination library[2] * and also possibly replacing #2 with taskflow The benefit of using taskflow for "sticky" task assignments isn't 100% clear, so it may actually make better sense to just use tooz for the leadership election, and the new asyncio model for #2. Starting there would have the advantage of being out on the side of the main ceilometer pipeline. However, if we do decide to go ahead with taskflow, then we could fine another good starting point for asyncio as an alternative. > - when all of ceilometer has been ported over to asyncio coroutines, > we can stop monkey patching, stop using greenio and switch to the > asyncio event loop ... kick back and light a cigar! :) Cheers, Eoghan [1] https://github.com/openstack/oslo-incubator/blob/master/openstack/common/threadgroup.py#L72 [2] https://github.com/stackforge/tooz [3] https://wiki.openstack.org/wiki/TaskFlow _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev