On Fri, Apr 29, 2022 at 10:37:54AM +0200, Emanuele Giuseppe Esposito wrote: > Am 28/04/2022 um 15:45 schrieb Stefan Hajnoczi: > > On Tue, Apr 26, 2022 at 04:51:09AM -0400, Emanuele Giuseppe Esposito wrote: > >> +static int has_writer; > > > > bool? > > Yes and no. With the latest findings and current implementation we could > have something like: > > wrlock() > has_writer = 1 > AIO_WAIT_WHILE(reader_count >=1) --> job_exit() > wrlock() > > But we are planning to get rid of AIO_WAIT_WHILE and allow wrlock to > only run in coroutines. This requires a lot of changes, and switch a lot > of callbacks in coroutines, but then we would avoid having such problems > and nested event loops.
I don't understand how this answer is related to the question about whether the type of has_writer should be bool? > > How can rd be negative, it's uint32_t? If AioContext->reader_count can > > be negative then please use a signed type. > > It's just "conceptually negative" while summing. The result is > guaranteed to be >= 0, otherwise we have a problem. > > For example, we could have the following AioContext counters: > A1: -5 A2: -4 A3: 10 > > rd variable below could become negative while looping, but we read it > only once we finish reading all counters, so it will always be >= 0. AioContext->reader_count is uint32_t but can hold negative values. It should be int32_t. IMO even rd should be int32_t so it's clear that it will hold negative values, even temporarily. The return value of reader_count() should be uint32_t because it's always a positive value. That way the types express what is going on clearly. > > > >> + aio_wait_kick(); > >> + qemu_co_queue_wait(&exclusive_resume, &aio_context_list_lock); > > > > Why loop here instead of incrementing reader_count and then returning? > > Readers cannot starve writers but writers can starve readers? > > Not sure what you mean here. Why returning? It was a misconception on my part. Looping is necessary. Somehow I thought that since we have aio_context_list_lock when we awake, has_writer cannot be 1 but that's incorrect. > > > > >> + } > >> + } > >> +} > >> + > >> +/* Mark bs as not reading anymore, and release pending exclusive ops. */ > >> +void coroutine_fn bdrv_graph_co_rdunlock(void) > >> +{ > >> + AioContext *aiocontext; > >> + aiocontext = qemu_get_current_aio_context(); > >> + > >> + qatomic_store_release(&aiocontext->reader_count, > >> + aiocontext->reader_count - 1); > > > > This is the point where reader_count can go negative if the coroutine > > was created in another thread. I think the type of reader_count should > > be signed. > > I think as long as we don't read it as a single, there's no problem There is no problem with the program's behavior, two's complement means unsigned integer operations produce the same result as signed integer operations. The issue is clarity: types should communicate the nature of the values held in a variable. If someone takes a look at the struct definition they will not know that ->reader_count is used to hold negative values. That can lead to misunderstandings and bugs in the future. Stefan
signature.asc
Description: PGP signature