Hi Bill, On Fri, May 27, 2011 at 1:54 PM, J. William Campbell <jwilliamcampb...@comcast.net> wrote: > On 5/26/2011 6:51 PM, Graeme Russ wrote: >> >> Hi Bill, >> >> On Fri, May 27, 2011 at 11:26 AM, J. William Campbell >> <jwilliamcampb...@comcast.net> wrote: >>>
[snip] >>> >>> Yes, that is the problem. I have come to the view that two 32 bit words >>> are >>> the best approach. Note that the lsb may actually not fill the full 32 >>> bits. >> >> Urghhh > > Hi All, > Consider the case where we have a 16 bit counter clocked by 1.0 MHz (a > nice number) interrupting on overflow. We could return (overflow counter << > 16) | hardware counter from the get_ticks routine, which maintains the > "binaryness" of the "number". > OTOH, suppose the counter is a down counter clocked at 1.19318 MHz with a 1 > ms period, like the PC counter is set up at present. What do we return > then? Whatever we do, it depends on the clock rate, which may change, and > involves some math, which may not work for all clock rates. In effect, you > have a dual radix number. Yes, conversion is possible but it is messy and > would be present in different forms in all the various get_tick routines. get_ticks() does not care about the clock rate - It simply looks at the current value of the hardware tick counter and the value of the hardware tick counter the last time get_ticks() was called, calculates the difference and adds that to the 64-bit software tick-counter I don't see how it being a down counter makes that any more difficult > This is neither simple nor maintainable. Further, it is un-necessary, as the > sync_timer routine is just going to convert the number from whatever radix > we converted it to into millisec. If we leave the two numbers as split, all > that complexity is removed from get_ticks and sent upward to the common > routine that converts the answer into ms anyway. This makes the system more > maintainable by placing the minimum requirement on get_ticks. The "tick" > should be opaque to anybody but sync_timebase anyway. But how is the common routine going to know if it is a split timer, up timer, down timer, little endian, big endian, etc, etc. get_ticks() abstracts the hardware implementation of the hardware timer from sync_timer() >>> >>> The top 32 bits are the rollover count and the bottom 32 bits are the >>> current counter. If the counter is a full 32 bits, so much the better. >> >> Ahhhhh - Lets keep it that way >> >>> Again, one could put them together inside the interrupt routine , but it >>> is >>> easier to check for a changed value if you don't do this. Otherwise, you >>> have to check both words. It also makes the isr faster. It is just an >> >> As I said before - Simple First, Fast Later > > I am in favor of simple. That is why I want get_ticks to do as little as > possible. It should just read the hardware register and the overflow counter > if it is separate. Make sure the overflow didn't change while we were > reading. This is redundant if we are not using interrupts but we can leave > the code in. It just won't do anything. We can also move the rollover > detection to sync_timebase. It will be redundant if we are using interrupts, > because time will never "back up". But we can do it this way. This > centralizes the overflow detection, which is a good thing. That does not sound simple to me. This, however, does: u64 get_ticks(void) { static u64 last_hw_tick_count; static u64 last_sw_tick_count; /* Now for the platform specific stuff - Read hardware tick counter */ u64 current_hw_tick_count = /* Read hw registers etc */ /* Deal with hardware weirdness - errata, stupid hw engineers etc */ u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count; last_sw_tick_count += elapsed_ticks; return last_sw_tick_count; } The '/* Read hw registers etc */' bit will always be the same, no matter what way you do it The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */' bit is where we are truly abstracting the hardware away - This is the bit you propose to leave mangled and deal with in sync_time? You make a lot of assumptions about the consistency of this highly variable logic across all current (and future) U-Boot platforms What if the hardware engineer figures he can save a squeeze some extra mileage out of a FPGA by implementing a 24-bit counter + 8-bit status in one 32-bit register - get_ticks() strips all that out You want to do some in get_ticks(), some in the ISR and some in sync_timer() - Put all the weird stuff on one place - The HAL >>> >>> increment of the overflow counter, like the PPC is now. I happen to think >>> it >>> is easier to convert the two 32 bit words to milliseconds one at a time, >>> but >>> if you feel you must use 64 bit words, that is fine too. Just remember, >>> the >>> counter does not always fill the entire bottom word. >> >> Urghhh >> >>> In cases where there are no interrupts, get_ticks has to detect that the >>> timer has "backed up" and increment the overflow counter itself, unless >>> the >>> counter is 64 bits to begin with and overflow is impossible anyway. >>> get_ticks should NOT try to detect overflows if interrupts are available. >>> If >>> it got both words before an interrupt happened, the answer is correct. If >>> it >>> got an interrupt in between fetching the words, the event will be >>> detected >>> and the value re-fetched. All sync_timebase would do now is convert the >>> returned value to milliseconds. >>> >>> So, if you have a 64 bit hardware counter, get_ticks reads and returns >>> it. >>> Else if you have interrupts, get_ticks reads the overflow counter into >>> the >>> msb. Next, it reads the hardware timer into the lsb. If the counter is a >>> down counter, the lsb is = to the counter max - the lsb. The msb is then >>> checked to make sure it hasn't changed, if it has, repeat the process. >>> All >>> the interrupt routine does is increase the overflow count. >>> If you don't have interrupts get_ticks reads the hardware counter into >>> the >>> lsb. If the counter is a down counter, the lsb is = to the counter max - >>> the >>> lsb. If the lsb is less than it was in the previous call to get ticks, >>> the >>> overflow counter is increased. get_ticks then loads the overflow counter >>> into the msb. >>> >>> sync_timebase converts the msb and lsb into millisec. It may do this by a >>> 64 >>> bit divide, or some shifting to align the lsb with then msb and the a 64 >>> bit >>> divide, or a bunch of 32 bit fractional multiplies, or any such approach >>> that works. >>> >>> How does that sound? >> >> The fact that you have described three different implementations of >> get_ticks() with two of these differentiated between whether you have >> interrupts or not immediately suggests this solution is inherently more >> complex and less maintainable. > > Yes, It is less maintainable because there are three cases. As shown above, > we can reduce the number of cases to one at the expense of some redundant > code. There are problems with our original approach when the interrupt rate > is == to the counter rollover period. This problem is detailed below. You > placed the constraint that the interrupt rate was faster than the rollover > period on the previous implementation. However, that constraint is not met > in essentially all interrupt driven cases. Also, note that these get_ticks > routines, even while different, are pretty simple. OTOH, some of the 64 bit > counters can't be read atomically either, so having the "redundant"check in > the code is not so bad. The again, we are talking about a total of probably > 5 lines of code or so anyway. It is a good idea to move the rollover > detection in the non-interrupt case to sync_timebase as well. Sync_timebase > can also invert the down-counting counters, removing that from get_ticks. > The wrap detection code can be #ifdef out if one is using interrupts and Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of code is usually a sign you are doing something wrong > offended by it's presence. Thanks for pointing this out and compelling me to > reduce the number of cases! Making get_ticks more lightweight is a good idea > in my opinion. >> >> Lets say you have a platform with a 32-bit tick counter running at a >> reasonably long rollover time so you decide not to use interrupts. Then >> you create a new platform with the same tick counter, but it runs much >> faster and you realise you need to implement the interrupt routine to >> make get_timer() work for long enough periods - Fine, you add an ISR >> for the new platform that calls sync_timebase - No other changes are >> required. >> >> The last thing we want is for the 64-bit tick counter to be conceptually >> different across platforms >> >> I just realised - the ISR _does not need to call the sync_timebase at all_ >> The ISR only needs to call get_ticks(), so it will be fast anyway > > The problem is that the way we previously detected wrapping does not work if > the interrupt rate is == to the counter wrap time, which it essentially > always is. If get_ticks is trying to update the wrap count when an interrupt Is it, always, on every platform? > comes in, it will do it wrong. If the interrupt rate was faster, it would > work, because get_ticks would always know the correct answer. Consider the > following. Get ticks is called by the counter overflowing (which resets it > to 0) and executes with the counter value at 0. Later, at the next rollover, > with no intervening calls to get ticks, the interrupt routine calls get > ticks. Assuming negligible interrupt latency, get_ticks just sees the > counter is still 0, so it will not detect a wrap condition. So now you loose But wait a minute, don't you KNOW that the interrupt gets triggered on a rollover so therefore there MUST have been a rollover, therefore there has been 2^32 (or 2^16 or 2^whatever) ticks which you can just subtract from the last known tick count (which would be zero if there had been no get_timer calls in between) > a period. I thought by passing in a flag we could force get_ticks to do the > right thing in this case, but since we must disable interrupts when we call > get ticks from the get_timer routine, the outer call to get ticks may have > already detected the rollover and the flag will force an additional rollover > to be detected. It is a classic race condition. If we detect rollover only Aha! - In nearly all situations, a race is better avoided than run! > in the interrupt routine, we do not need to protect the rollover variable > from access by the main program, so we can be sure the results of get_ticks > is coherent. If we try do do it in both places, we will have trouble. If the > interrupt occurred at a faster rate, like say twice the rollover, we > wouldn't have a problem. But it doesn't. In most cases it happens at the > rollover rate, or just a bit more sometimes due to interrupt latency not Again does it really - for all arches? > being a constant. It may appear to happen at somewhat less than a rollover > as well, if the interrupt latency was long at time n-1 and short at time n. > I think this problem can be overcome in the old design by keeping track of > whether the last update was by a non-interrupt or interrupt call to > get_ticks , but I think the new design is better anyway. I disagree - The whole point of a HAL is to Abstract the Hardware Regards, Graeme _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot