Hello All, This is gentle reminder about this RFC.
Sadly, Andrii Anisov has left our team. But I'm commited to continue his work on time accounting and real time scheduling. I do realize, that proposed patches have become moldy. I can rebase them onto current master, if it would help. On Wed, 2019-11-06 at 13:24 +0200, Andrii Anisov wrote: > Hello Julien, > > On 28.10.19 16:28, Julien Grall wrote: > > It would be good to get a review from the scheduler maintainers (Dario, > > George) to make sure they are happy with the suggested states here. > > I would not say I'm completely happy with this set of states. I'd like to > have a discussion on this topic with scheduler maintainers. Also because they > could have a different view from x86 world. I would love to hear any inputs on this topc from general scheduling approach standpoint and from x86 view. > > > Introduce per-pcpu time accounting what includes the following states: > > > > I think we need a very detailed description of each states. Otherwise it > > will be hard to know how to categorize it. > > I agree that we need a very detailed description of each states. Ask > questions if something is not clear or doubtful. I guess we could have > something better after Q&A process. > > > > TACC_HYP - the pcpu executes hypervisor code like softirq processing > > > (including scheduling), tasklets and context switches > > > > IHMO, "like" is too weak here. What do you exactly plan to introduce? > > I think this should be what hypervisor does except hypercall and IO emulation > (what is TACC_GSYNC). > > > For instance, on Arm, you consider that leave_hypervisor_tail() is part of > > TACC_HYP. This function will include some handling for synchronous trap. > > I guess you are saying about `p2m_flush_vm`. I doubt here, and open for > suggestions. > > > > > TACC_GUEST - the pcpu executes guests code > > > > Looking at the arm64 code, you are executing some hypervisor code here. I > > agree this is impossible to not run any hypervisor code with TACC_GUEST, > > but I think this should be clarified in the documentation. > > Do you mean adding few words about still having some hypervisor code near the > actual context switch from/to guest (entry/return_from_trap)? > > > > TACC_IDLE - the low-power state of the pcpu > > > > Did you intend to mean "idle vCPU" is in use? > > No. I did mean what is written. > Currently, the idle vcpu does hypervisor work (e.g. tasklets) along with the > low-power mode. IMO we have to separate them. > > > > TACC_IRQ - the pcpu performs interrupts processing, without separation to > > > guest or hypervisor interrupts > > > TACC_GSYNC - the pcpu executes hypervisor code to process synchronous trap > > > from the guest. E.g. hypercall processing or io emulation. > > > > > > Currently, the only reenterant state is TACC_IRQ. It is assumed, no > > > changes > > > to state other than TACC_IRQ could happen until we return from nested > > > interrupts. IRQ time is accounted in a distinct way comparing to other > > > states. > > > > s/comparing/compare/ > > OK. > > > > It is acumulated between other states transition moments, and is > > > substracted > > > > s/acumulated/accumulated/ s/substracted/subtracted/ > > OK. > > > > from the old state on states transion calculation. > [1] > > s/transion/transition/ > > OK. > > > > Signed-off-by: Andrii Anisov <andrii_ani...@epam.com> > > > --- > > > xen/common/schedule.c | 81 > > > +++++++++++++++++++++++++++++++++++++++++++++++++ > > > xen/include/xen/sched.h | 27 +++++++++++++++++ > > > 2 files changed, 108 insertions(+) > > > > > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > > > index 7b71581..6dd6603 100644 > > > --- a/xen/common/schedule.c > > > +++ b/xen/common/schedule.c > > > @@ -1539,6 +1539,87 @@ static void schedule(void) > > > context_switch(prev, next); > > > } > > > +DEFINE_PER_CPU(struct tacc, tacc); > > > + > > > +static void tacc_state_change(enum TACC_STATES new_state) > > > > This should never be called with the TACC_IRQ, right? > > Yes. Actually, tacc->state should never be TACC_IRQ. > Because of TACC_IRQ reenterability it is handled through the tacc->irq_cnt > and tacc->irq_enter_time. > > > > +{ > > > + s_time_t now, delta; > > > + struct tacc* tacc = &this_cpu(tacc); > > > + unsigned long flags; > > > + > > > + local_irq_save(flags); > > > + > > > + now = NOW(); > > > + delta = now - tacc->state_entry_time; > > > + > > > + /* We do not expect states reenterability (at least through this > > > function)*/ > > > + ASSERT(new_state != tacc->state); > > > + > > > + tacc->state_time[tacc->state] += delta - tacc->irq_time; > > > + tacc->state_time[TACC_IRQ] += tacc->irq_time; > > > + tacc->irq_time = 0; > > > + tacc->state = new_state; > > > + tacc->state_entry_time = now; > > > + > > > + local_irq_restore(flags); > > > +} > > > + > > > +void tacc_hyp(int place) > > > > Place is never used except for your commented printk. So what's the goal > > for it? > > Place is just a piece of code used for debugging, as well as printk. I keept > it here because this series is very RFC, yet it could be removed. > > > Also, is it really necessary to provide helper for each state? Couldn't we > > just introduce one functions doing all the state? > > I'd like calling that stuff from assembler without parameters. But have no > strong opinion here. > > > > +{ > > > +// printk("\ttacc_hyp %u, place %d\n", smp_processor_id(), place); > > > + tacc_state_change(TACC_HYP); > > > +} > > > + > > > +void tacc_guest(int place) > > > +{ > > > +// printk("\ttacc_guest %u, place %d\n", smp_processor_id(), place); > > > + tacc_state_change(TACC_GUEST); > > > +} > > > + > > > +void tacc_idle(int place) > > > +{ > > > +// printk("\tidle cpu %u, place %d\n", smp_processor_id(), place); > > > + tacc_state_change(TACC_IDLE); > > > +} > > > + > > > +void tacc_gsync(int place) > > > +{ > > > +// printk("\ttacc_gsync %u, place %d\n", smp_processor_id(), place); > > > + tacc_state_change(TACC_GSYNC); > > > +} > > > + > > > +void tacc_irq_enter(int place) > > > +{ > > > + struct tacc* tacc = &this_cpu(tacc); > > > + > > > +// printk("\ttacc_irq_enter %u, place %d, cnt %d\n", > > > smp_processor_id(), place, this_cpu(tacc).irq_cnt); > > > + ASSERT(!local_irq_is_enabled()); > > > + ASSERT(tacc->irq_cnt >= 0); > > > + > > > + if ( tacc->irq_cnt == 0 ) > > > + { > > > + tacc->irq_enter_time = NOW(); > > > + } > > > + > > > + tacc->irq_cnt++; > > > +} > > > + > > > +void tacc_irq_exit(int place) > > > +{ > > > + struct tacc* tacc = &this_cpu(tacc); > > > + > > > +// printk("\ttacc_irq_exit %u, place %d, cnt %d\n", > > > smp_processor_id(), place, tacc->irq_cnt); > > > + ASSERT(!local_irq_is_enabled()); > > > + ASSERT(tacc->irq_cnt > 0); > > > + if ( tacc->irq_cnt == 1 ) > > > + { > > > + tacc->irq_time = NOW() - tacc->irq_enter_time; > > > > If I understand correctly, you will use irq_time to update TACC_IRQ in > > tacc_state_change(). It may be possible to receive another interrupt before > > the state is changed (e.g. HYP -> GUEST). This means only the time for the > > last IRQ received would be accounted. > > I do lock IRQs for state change. Shouldn't that protect it? > > > > + tacc->irq_enter_time = 0; > > > + } > > > + > > > + tacc->irq_cnt--; > > > +} > > > + > > > void context_saved(struct vcpu *prev) > > > { > > > /* Clear running flag /after/ writing context to memory. */ > > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > > > index e3601c1..04a8724 100644 > > > --- a/xen/include/xen/sched.h > > > +++ b/xen/include/xen/sched.h > > > @@ -1002,6 +1002,33 @@ extern void dump_runq(unsigned char key); > > > void arch_do_physinfo(struct xen_sysctl_physinfo *pi); > > > +enum TACC_STATES { > > > > We don't tend to use all uppercases for enum name. > > OK. > > > > + TACC_HYP = 0, > > > > enum begins at 0 and increment by one every time. So there is no need to > > hardcode a number. > > > > Also, looking at the code, I think you rely on the first state to be > > TACC_HYP. Am I correct? > > TACC_HYP is expected to be the initial state of the PCPU. > > > > + TACC_GUEST = 1, > > > + TACC_IDLE = 2, > > > + TACC_IRQ = 3, > > > + TACC_GSYNC = 4, > > > + TACC_STATES_MAX > > > +}; > > > It would be good to document all the states in the header as well. > > OK. > > > > + > > > +struct tacc > > > > Please document the structure. > > OK. > > > > +{ > > > + s_time_t state_time[TACC_STATES_MAX]; > > > + s_time_t state_entry_time; > > > + int state; > > > > This should be the enum you used above here. > > Yep. > > > > + > > > + s_time_t guest_time; > > > > This is not used. > > Yep, will drop it. > > > > + > > > + s_time_t irq_enter_time; > > > + s_time_t irq_time; > > > + int irq_cnt; > > Why do you need this to be signed? > > For assertion. > > > > +}; > > > + > > > +DECLARE_PER_CPU(struct tacc, tacc); > > > + > > > +void tacc_hyp(int place); > > > +void tacc_idle(int place); > > > + > > > #endif /* __SCHED_H__ */ > > > /* > > > > > > > Cheers, > >