On 29/11/2016 14:24, Fam Zheng wrote: > On Tue, 11/29 12:17, Paolo Bonzini wrote: >> >> >> On 29/11/2016 11:32, Fam Zheng wrote: >>> >>> The kernel change will be a new prctl operation (should it be a different >>> syscall to extend?) to register a new type of eventfd called "idle eventfd": >>> >>> prctl(PR_ADD_IDLE_EVENTFD, int eventfd); >>> prctl(PR_DEL_IDLE_EVENTFD, int eventfd); >>> >>> It will be notified by kernel each time when the thread's local core has no >>> runnable threads (i.e., entering idle state). >>> >>> QEMU can then add this eventfd to its event loop when it has events to >>> poll, and >>> watch virtqueue/linux-aio memory from userspace in the fd handlers. >>> Effectiely, >>> if a ppoll() would have blocked because there are no new events, it could >>> now >>> return immediately because of idle_eventfd events, and do the idle polling. >> >> This has two issues: >> >> * it only reports the leading edge of single_task_running(). Is it also >> useful to stop polling on the trailing edge? > > QEMU can clear the eventfd right after event firing so I don't think it is > necessary.
Yes, but how would QEMU know that the eventfd has fired? It would be very expensive to read the eventfd on each iteration of polling. Paolo >> * it still needs a system call before polling is entered. Ideally, QEMU >> could run without any system call while in polling mode. >> >> Another possibility is to add a system call for single_task_running(). >> It should be simple enough that you can implement it in the vDSO and >> avoid a context switch. There are convenient hooking points in >> add_nr_running and sub_nr_running. > > That sounds good! > > Fam >