Hello All,

This is gentle reminder about this RFC. 

Sadly, Andrii Anisov has left our team. But I'm commited to continue
his work on time accounting and real time scheduling.

I do realize, that proposed patches have become moldy. I can rebase
them onto current master, if it would help. 

On Wed, 2019-11-06 at 13:24 +0200, Andrii Anisov wrote:
> Hello Julien,
> 
> On 28.10.19 16:28, Julien Grall wrote:
> > It would be good to get a review from the scheduler maintainers (Dario, 
> > George) to make sure they are happy with the suggested states here.
> 
> I would not say I'm completely happy with this set of states. I'd like to 
> have a discussion on this topic with scheduler maintainers. Also because they 
> could have a different view from x86 world.

I would love to hear any inputs on this topc from general scheduling
approach standpoint and from x86 view.  

> > > Introduce per-pcpu time accounting what includes the following states:
> > 
> > I think we need a very detailed description of each states. Otherwise it 
> > will be hard to know how to categorize it.
> 
> I agree that we need a very detailed description of each states. Ask 
> questions if something is not clear or doubtful. I guess we could have 
> something better after Q&A process.
> 
> > > TACC_HYP - the pcpu executes hypervisor code like softirq processing
> > >             (including scheduling), tasklets and context switches
> > 
> > IHMO, "like" is too weak here. What do you exactly plan to introduce?
> 
> I think this should be what hypervisor does except hypercall and IO emulation 
> (what is TACC_GSYNC).
> 
> > For instance, on Arm, you consider that leave_hypervisor_tail() is part of 
> > TACC_HYP. This function will include some handling for synchronous trap.
> 
> I guess you are saying about `p2m_flush_vm`. I doubt here, and open for 
> suggestions.
> 
> 
> > > TACC_GUEST - the pcpu executes guests code
> > 
> > Looking at the arm64 code, you are executing some hypervisor code here. I 
> > agree this is impossible to not run any hypervisor code with TACC_GUEST, 
> > but I think this should be clarified in the documentation.
> 
> Do you mean adding few words about still having some hypervisor code near the 
> actual context switch from/to guest (entry/return_from_trap)?
> 
> > > TACC_IDLE - the low-power state of the pcpu
> > 
> > Did you intend to mean "idle vCPU" is in use?
> 
> No. I did mean what is written.
> Currently, the idle vcpu does hypervisor work (e.g. tasklets) along with the 
> low-power mode. IMO we have to separate them.
> 
> > > TACC_IRQ - the pcpu performs interrupts processing, without separation to
> > >             guest or hypervisor interrupts
> > > TACC_GSYNC - the pcpu executes hypervisor code to process synchronous trap
> > >               from the guest. E.g. hypercall processing or io emulation.
> > > 
> > > Currently, the only reenterant state is TACC_IRQ. It is assumed, no 
> > > changes
> > > to state other than TACC_IRQ could happen until we return from nested
> > > interrupts. IRQ time is accounted in a distinct way comparing to other 
> > > states.
> > 
> > s/comparing/compare/
> 
> OK.
> 
> > > It is acumulated between other states transition moments, and is 
> > > substracted
> > 
> > s/acumulated/accumulated/ s/substracted/subtracted/
> 
> OK.
> 
> > > from the old state on states transion calculation.
> [1]
> > s/transion/transition/
> 
> OK.
> 
> > > Signed-off-by: Andrii Anisov <andrii_ani...@epam.com>
> > > ---
> > >   xen/common/schedule.c   | 81 
> > > +++++++++++++++++++++++++++++++++++++++++++++++++
> > >   xen/include/xen/sched.h | 27 +++++++++++++++++
> > >   2 files changed, 108 insertions(+)
> > > 
> > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > > index 7b71581..6dd6603 100644
> > > --- a/xen/common/schedule.c
> > > +++ b/xen/common/schedule.c
> > > @@ -1539,6 +1539,87 @@ static void schedule(void)
> > >       context_switch(prev, next);
> > >   }
> > > +DEFINE_PER_CPU(struct tacc, tacc);
> > > +
> > > +static void tacc_state_change(enum TACC_STATES new_state)
> > 
> > This should never be called with the TACC_IRQ, right?
> 
> Yes. Actually, tacc->state should never be TACC_IRQ.
> Because of TACC_IRQ reenterability it is handled through the tacc->irq_cnt 
> and tacc->irq_enter_time.
> 
> > > +{
> > > +    s_time_t now, delta;
> > > +    struct tacc* tacc = &this_cpu(tacc);
> > > +    unsigned long flags;
> > > +
> > > +    local_irq_save(flags);
> > > +
> > > +    now = NOW();
> > > +    delta = now - tacc->state_entry_time;
> > > +
> > > +    /* We do not expect states reenterability (at least through this 
> > > function)*/
> > > +    ASSERT(new_state != tacc->state);
> > > +
> > > +    tacc->state_time[tacc->state] += delta - tacc->irq_time;
> > > +    tacc->state_time[TACC_IRQ] += tacc->irq_time;
> > > +    tacc->irq_time = 0;
> > > +    tacc->state = new_state;
> > > +    tacc->state_entry_time = now;
> > > +
> > > +    local_irq_restore(flags);
> > > +}
> > > +
> > > +void tacc_hyp(int place)
> > 
> > Place is never used except for your commented printk. So what's the goal 
> > for it?
> 
> Place is just a piece of code used for debugging, as well as printk. I keept 
> it here because this series is very RFC, yet it could be removed.
> 
> > Also, is it really necessary to provide helper for each state? Couldn't we 
> > just introduce one functions doing all the state?
> 
> I'd like calling that stuff from assembler without parameters. But have no 
> strong opinion here.
>   
> > > +{
> > > +//    printk("\ttacc_hyp %u, place %d\n", smp_processor_id(), place);
> > > +    tacc_state_change(TACC_HYP);
> > > +}
> > > +
> > > +void tacc_guest(int place)
> > > +{
> > > +//    printk("\ttacc_guest %u, place %d\n", smp_processor_id(), place);
> > > +    tacc_state_change(TACC_GUEST);
> > > +}
> > > +
> > > +void tacc_idle(int place)
> > > +{
> > > +//    printk("\tidle cpu %u, place %d\n", smp_processor_id(), place);
> > > +    tacc_state_change(TACC_IDLE);
> > > +}
> > > +
> > > +void tacc_gsync(int place)
> > > +{
> > > +//    printk("\ttacc_gsync %u, place %d\n", smp_processor_id(), place);
> > > +    tacc_state_change(TACC_GSYNC);
> > > +}
> > > +
> > > +void tacc_irq_enter(int place)
> > > +{
> > > +    struct tacc* tacc = &this_cpu(tacc);
> > > +
> > > +//    printk("\ttacc_irq_enter %u, place %d, cnt %d\n", 
> > > smp_processor_id(), place, this_cpu(tacc).irq_cnt);
> > > +    ASSERT(!local_irq_is_enabled());
> > > +    ASSERT(tacc->irq_cnt >= 0);
> > > +
> > > +    if ( tacc->irq_cnt == 0 )
> > > +    {
> > > +        tacc->irq_enter_time = NOW();
> > > +    }
> > > +
> > > +    tacc->irq_cnt++;
> > > +}
> > > +
> > > +void tacc_irq_exit(int place)
> > > +{
> > > +    struct tacc* tacc = &this_cpu(tacc);
> > > +
> > > +//    printk("\ttacc_irq_exit %u, place %d, cnt %d\n", 
> > > smp_processor_id(), place, tacc->irq_cnt);
> > > +    ASSERT(!local_irq_is_enabled());
> > > +    ASSERT(tacc->irq_cnt > 0);
> > > +    if ( tacc->irq_cnt == 1 )
> > > +    {
> > > +        tacc->irq_time = NOW() - tacc->irq_enter_time;
> > 
> > If I understand correctly, you will use irq_time to update TACC_IRQ in 
> > tacc_state_change(). It may be possible to receive another interrupt before 
> > the state is changed (e.g. HYP -> GUEST). This means only the time for the 
> > last IRQ received would be accounted.
> 
> I do lock IRQs for state change. Shouldn't that protect it?
> 
> > > +        tacc->irq_enter_time = 0;
> > > +    }
> > > +
> > > +    tacc->irq_cnt--;
> > > +}
> > > +
> > >   void context_saved(struct vcpu *prev)
> > >   {
> > >       /* Clear running flag /after/ writing context to memory. */
> > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > > index e3601c1..04a8724 100644
> > > --- a/xen/include/xen/sched.h
> > > +++ b/xen/include/xen/sched.h
> > > @@ -1002,6 +1002,33 @@ extern void dump_runq(unsigned char key);
> > >   void arch_do_physinfo(struct xen_sysctl_physinfo *pi);
> > > +enum TACC_STATES {
> > 
> > We don't tend to use all uppercases for enum name.
> 
> OK.
> 
> > > +    TACC_HYP = 0,
> > 
> > enum begins at 0 and increment by one every time. So there is no need to 
> > hardcode a number.
> > 
> > Also, looking at the code, I think you rely on the first state to be 
> > TACC_HYP. Am I correct?
> 
> TACC_HYP is expected to be the initial state of the PCPU.
> 
> > > +    TACC_GUEST = 1,
> > > +    TACC_IDLE = 2,
> > > +    TACC_IRQ = 3,
> > > +    TACC_GSYNC = 4,
> > > +    TACC_STATES_MAX
> > > +};
> > > It would be good to document all the states in the header as well.
> 
> OK.
> 
> > > +
> > > +struct tacc
> > 
> > Please document the structure.
> 
> OK.
> 
> > > +{
> > > +    s_time_t state_time[TACC_STATES_MAX];
> > > +    s_time_t state_entry_time;
> > > +    int state;
> > 
> > This should be the enum you used above here.
> 
> Yep.
> 
> > > +
> > > +    s_time_t guest_time;
> > 
> > This is not used.
> 
> Yep, will drop it.
> 
> > > +
> > > +    s_time_t irq_enter_time;
> > > +    s_time_t irq_time;
> > > +    int irq_cnt;
> > Why do you need this to be signed?
> 
> For assertion.
>   
> > > +};
> > > +
> > > +DECLARE_PER_CPU(struct tacc, tacc);
> > > +
> > > +void tacc_hyp(int place);
> > > +void tacc_idle(int place);
> > > +
> > >   #endif /* __SCHED_H__ */
> > >   /*
> > > 
> > 
> > Cheers,
> > 

Reply via email to