On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote: > 于 2013-8-16 16:12, Wenchao Xia 写道: > >于 2013-8-16 15:15, Wenchao Xia 写道: > >>于 2013-8-16 0:32, Michael Roth 写道: > >>>Quoting Michael Roth (2013-08-15 10:23:20) > >>>>Quoting Wenchao Xia (2013-08-13 03:44:39) > >>>>>于 2013-8-13 1:01, Michael Roth 写道: > >>>>>>Quoting Paolo Bonzini (2013-08-12 02:30:28) > >>>>>>>>1) rename AioContext to AioSource. > >>>>>>>> This is my major purpose, which declare it is not a "context" > >>>>>>>>concept, > >>>>>>>>and GMainContext is the entity represent the thread's activity. > >>>>>>> > >>>>>>>Note that the nested event loops in QEMU are _very_ different from > >>>>>>>glib nested event loops. In QEMU, nested event loops only run block > >>>>>>>layer events. In glib, they run all events. That's why you need > >>>>>>>AioContext. > >>>>>> > >>>>>>Would it be possible to use glib for our nested loops as well by just > >>>>>>setting a higher priority for the AioContext GSource? > >>>>>> > >>>>>>Stefan and I were considering how we could make use of his "drop > >>>>>>ioflush" patches to use a common mechanism to register fd events, but > >>>>>>still allow us to distinguish between AioContext and non-AioContext > >>>>>>for nested loops. I was originally thinking of using prepare() > >>>>>>functions > >>>>>>to filter out non-AioContext events, but that requires we implement > >>>>>>on GSource's with that in mind, and non make use of pre-baked ones > >>>>>>like GIOChannel's, and bakes block stuff into every event source > >>>>>>implementation. > >>>>>> > >>>>> Besides priority, also g_source_set_can_recurse() can help. > >>>>> With a deeper think, I found a harder problem: > >>>>>g_main_context_acquire() and g_main_context_release(). In release, > >>>>>pending BH/IO call back need to be cleared, but this action can't > >>>>>be triggered automatically when user call g_main_context_release(). > >>>> > >>>>I don't understand why this is a requirement, gmctx_acquire/release > >>>>ensure > >>>>that only one thread attempts to iterate the main loop at a time. this > >>>>isn't currently an issue in qemu, and if we re-implemented > >>>>qemu_aio_wait() > >>>>to use the same glib interfaces, the tracking of in-flight requests > >>>>would > >>>>be moved to the block layer via Stefan's 'drop io_flush' patches, which > >>>>moves that block-specific logic out of the main loop/AioContext GSource > >>>>by design. Are there other areas where you see this as a problem? > >>> > >>>I think I understand better what you're referring to, you mean that > >>>if qemu_aio_wait was called, and was implementated to essentially call > >>>g_main_context_iterate(), that after, say, 1 iteration, the > >>>iothread/dataplane thread might acquire the main loop and dispatch > >>>block/non-block events between qemu_aio_wait() returned? The simple > >>>approach would be to have qemu_aio_wait() call > >>>g_main_context_acquire/release > >>>at the start end of the function, which would ensure that this never > >>>happens. > >>> > >> qemu_aio_wait() is relative simple to resolve by > >>g_main_context_acquire(), but also shows additional code needed > >>for a thread switch: > >>(guess following is the plan to implement qemu_aio_wait()) > >>qemu_aio_wait(GMainContext *ctx) > >>{ > >> return g_main_context(ctx, PRI_BLOCK); > >>} > >>at caller: > >>{ > >> while (qemu_aio_wait(ctx) { > >> ; > >> } > >> g_main_context_release(ctx); > >>} > >> The above code shows that, in *ctx switch operation, there is > >>more than glib's own event loop API envolved, qemu_aio_wait(). So > >>it referenced to a question: what data structure > >>should be used to represent context concept and control the thread > >>switching behavior? It is better to have a clear layer, GMainContext or > >>GlibQContext, instead of GMainContext plus custom function. The caller > >>reference to at least two: nested user block layer, flat user above > >>block layer. > >> In my opinion, this problem is brought by Gsource AioContext, Take > >>the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as > >>an example, there are aync following operations involved for AioContext: > >>1 the bdrv_cb() will be executed in bdrv_co_em_bh(). > >>2 bdrv_co_io_em_complete() will be executed in event_notfier_ready(). > >>3 aio_worker() will be executed in worker_thread(). > >> Operation 2 and 3 are tracked by block layer's queue after Stefan's > >>dropping io_flush() series. > >> Now if we stick to GMainContext to represent context concept, > >>then when thread B want to aquire GMainContext used by thread A, > >>all works setupped by A should be finished before B aquire it, > >>otherwise B will execute some function supposed to work in A. The > >>work refers to the three kind of aync functions above. > >> For this issue, we can solve it by different means: > >>1 the event loop API doesn't guarentee work setupped by thread A > >>will always be finished in A. This put a limitation to caller to > >>consider thread switching. I talked on IRC with Stefan, and thinks > >>it is possible for the nested user block layer, but I still want to > >>avoid this to the flat user above block layer. > >>2 ask caller to finish all pending operations. > >> 2.1 glib GMainContext API plus custom API such as aio_wait(). This is > >>bad that detail under GMainContext is exposed to caller. Since > >>operation 1 mentioned before is not tracked yet, to make sure bdrv_cb() > >>is called in thread setupped it, 1 need to be added in the track > >>queue, or in the call chain of aio_wait(), check the queue plus check > >>operation 1. Perhaps add a custom function ask caller to call as > >>context_work_flush()? > > If a well named API do the flush work present, using Glib API plus > >it seems also OK, and can avoid wrapper. I guess > >bdrv_drain_all(GMainContext *ctx, ...) can do it. > > > I haven't found a good answer in gstream, but want to show > some idea from my understanding. > > Following is a brief picture of the current event loop in qemu, > Alex's timer for AioContext is also drawn here: > > ======================== > || I/O thread in vl.c || > ======================== > | > run loop | > | > ==================== | > || other || qemu_set_fd_handler2() ===================== > || ||-----------------------------|| Main_loop || > ||(vnc, migration)|| | ===================== > ==================== | GLib | > | event loop API| > qemu_set_fd_handler()| | > ----------------- ==================== > | || GMainContext || > | ==================== > ========== | (should it be removed?) | > || hw ||-------------------------------------- |GSouce > ========== | | |Attach > | main_loop_tlg| | > qemu_bh_***()| | | > | | | > ======|===============|======================= > || | | || > =========== || ====== ================== ======= || > || block ||---------------|| | BH | | TimerListGroup | | Fd | || > =========== qemu_bh_***()|| ====== ================== ======= || > qemu_aio_wait()|| || > qemu_aio_set_fd_handler()|| AioContext || > || (block layer's event collection) || > ============================================= > > > The main issue here is that components are tightly bind together and > no clear layer represent the thread and event loop API. Block and hw > code are inter acting with AioContext, so both GMainContext and > AioContext are playing the role. I hope to form a library for block, > So need to pick up one to provide event loop, the choice seems to be: > 1 GMainContext. > 2 AioContext. > 3 Encapsulation, such as GlibQContext. > > 1) and 2) would not be perfect since non standard glib event loop will > be exposed, 3) will shows a unified interface similar to glib main loop, > but more code adjust. After some thinking, I guess AioContext is the > easiest way, which represent the block's own event loop, and give up > using glib event loop at this level, just add custom API as > block_iterate(). Briefly, bdrv_read will becomes: > int bdrv_read(AioContext *ctx, ....);
I don't understand why you want to add AioContext *ctx to bdrv_read(). The synchronous APIs already work fine since no event loop integration is necessary at all (the event loop is hidden inside the synchronous function). Since AioContext provides a GSource, integrating with an application's glib event loop should also be easy. The only hard part is timers, since we use nanosecond timers - there we should just round up to millisecond granularity to libqblock. The nanosecond timers aren't critical in libqblock, only for running guests. Stefan