On Sun, Oct 25, 2015 at 6:23 AM, Dmitry Osipenko <dig...@gmail.com> wrote: > 25.10.2015 02:55, Peter Crosthwaite пишет: > >> On Sat, Oct 24, 2015 at 3:22 PM, Dmitry Osipenko <dig...@gmail.com> wrote: >>> >>> 24.10.2015 22:45, Peter Crosthwaite пишет: >>>> >>>> >>>> >>>> This looks like a give-up without trying to get the correct value. If >>>> the calculated value (using the normal-path logic below) is sane, you >>>> should just use it. If it comes out bad then you should clamp to 1. >>>> >>>> I am wondering whether this clamping policy (as in original code as >>>> well) is correct at all though. The value of a free-running >>>> short-interval periodic timer (poor mans random number generator) >>>> without any actual interrupt generation will be affected by QEMUs >>>> asynchronous handling of timer events. So if I set limit to 100, then >>>> sample this timer every user keyboard stroke, I should get a uniform >>>> distribution on [0,100]. Instead in am going to get lots of 1s. This >>> >>> >>> >>> Right, that's a good example. What about to scale ptimer period to match >>> adjusted timer_mod interval? >>> >> >> Thats just as incorrect as changing the limit IMO. The guest could get >> confused with a timer running at the wrong frequency. >> >>>> is more broken in the original code (as you state), as I will get > >>>> 100, but I think we have traded broken for slightly less broken. I >>>> think the correct semantic is to completely ignoring rate limitin >>>> except for the scheduling on event callbacks. That is, the timer >>> >>> >>> >>> I'm missing you here. What event callbacks? >>> >> >> when timer_mod() hits, and it turn triggers some device specific event >> (usually an interrupt). >> >> There are two basic interactions for any QEMU timer. There are >> synchronous events, i.e. the guest reading (polling) the counter which >> is what this patch tries to fix. The second is the common case of >> periodic interrupt generation. My proposal is that rate limiting does >> not affect synchronous operation, only asynchronous (so my keystroke >> RNG case works). In the current code, if ptimer_get_count() is called >> when the event has passed it returns 0 under the assumption that the >> timer_mod callback is about to happen. With rate-limiting that may be >> well in the future. >> > > ptimer_tick() would happen on the next QEMU loop cycle, so it might be more > reasonable to return counter = 1 here, wouldn't it? > > >>>> interval is not rate limited, instead the timer_mod interval >>>> (next_event -last_event) just has a 10us clamp. >>>> >>>> The means the original code semantic of returning counter = 0 for an >>>> already triggered timer is wrong. It should handle in-the-past >>>> wrap-arounds as wraparounds by triggering the timer and redoing the >>>> math with the new interval values. So instead the logic would be >>>> something like: >>>> >>>> timer_val = -1; >>>> >>>> for(;;) { >>>> >>>> if (!enabled) { >>>> return delta; >>>> } >>>> >>>> timer_val = (next-event - now) * scaling(); >>>> if (timer_val >= 0) { >>>> return timer_val; >>>> } >>>> /* Timer has actually expired but we missed it, reload it and try >>>> again >>>> */ >>>> ptimer_tick(); >>>> } >>> >>> >>> >>> Why do you think that ptimer_get_count() == 0 in case of the running >>> periodic timer that was expired while QEMU was "on the way" to ptimer >>> code >>> is bad and wrong? >> >> >> Because you may have gone past the time when it was actually zero and >> reloaded and started counting again. It should return the real >> (reloaded and resumed) value. This is made worse by rate limiting as >> the timer will spend a long time at the clamp value waiting for the >> rate-limited tick to fix it. >> >> Following on from before, we don't want the async stuff to affect >> sync. As the async callbacks are heavily affected by rate limiting we >> don't want the polled timer having to rely on the callbacks (async >> path) at all for correct operation. That's what the current >> implementation of ptimer_get_count assumes with that 0-clamp. >> > > Alright, that make sense now. Thanks for clarifying. > >>> From the guest point of view it's okay (no?), do we really >>> need to overengineer that corner case? >>> >> >> Depends on your use case. Your bug report is correct in that the timer >> should never be outside the bounds of the limit. But you are fixing >> that very specifically with a minimal change rather than correcting >> the larger ptimer_get_count() logic which I think is more broken than >> you suggest it is. >> >>>> >>>> ptimer_reload() then needs to be patched to make sure it always >>>> timer_mod()s in the future, otherwise this loop could iterate a large >>>> number of times. >>>> >>>> This means that when the qemu_timer() actually ticks, a large number >>>> or cycles may have occured, but we can justify that in that callback >>>> event latency (usually interrupts) is undefined anyway and coalescing >>>> of multiples may have happened as part of that. This usually matches >>>> expectations of real guests where interrupt latency is ultimately >>>> undefined. >>> >>> >>> >>> ptimer_tick() is re-arm'ing the qemu_timer() in case of periodic mode. >>> Hope >>> I haven't missed your point here. >>> >> >> Yes. But it is also capable of doing the heavy lifting for our already >> expired case. Basic idea is, if the timer is in a bad state (should >> have hit) but hasn't, do the hit to put the timer into a good state >> (by calling ptimer_tick) then just do the right thing. That's what the >> loop does. It should also work for an in-the-past one-shot. >> > > Summarizing what we have now: > > There are two issues with ptimer: > > 1) ptimer_get_count() return wrong values with adjusted .limit > > Patch V7 doesn't solve that issue, but makes it slightly better by clamping > returned value to [0, 1]. That's not what we want, we need to return counter > value in it's valid range [0, limit]. > > You are rejecting variant of scaling ptimer period, saying that it might > affect software behavior inside the guest. But by adjusting the timer, we > might already causing same misbehavior in case of blazing fast host machine.
It is a different misbehaviour. We are modelling the polled timer perfectly but limiting the frequency of callbacks (interrupts). I think this is the lesser of two evils. > I'll scratch my head a bit more on it. If you have any better idea, please > share. > > 2) ptimer_get_count() return fake 0 value in case of the expired > qemu_timer() without triggering ptimer_tick() > > You're suggesting to solve it by running ptimer_tick(). So if emulated > device uses ptimer tick event (scheduled qemu bh) to raise interrupt, it > would do it by one QEMU loop cycle earlier. > Yes, this is ok, as even in a rate limited scenario there is no reason to absolutely force the rate limit. If a poll happens it should just flush the waiting interrupt. > My question here: is it always legal for the guest software to be able to > get counter = 0 on poll while CPU interrupt on timer expire hasn't happened > yet (would happen after one QEMU cycle). Yes. And I am going a step further by saying it is ok for the guest software to see the timer value wrapped around before the expire too. >I guess it might cause software > misbehavior if it assumes that the real hardware has CPU and timer running > in the same clock domain, i.e. such situation might be not possible. Assumptions about the CPU clocking only make sense in icount mode, where the rate limiter is disable anyway. > So I'm > suggesting to return counter = 1 and allow ptimer_tick() happen on it's own. > My alternate suggestion is, if you detect that the tick should have already happened, just make it happen. I don't see the need to rate limit a polled timer. Regards, Peter > -- > Dmitry