> In article <[EMAIL PROTECTED]>, > Dominic Marks <[EMAIL PROTECTED]> wrote: > > On Mon, Feb 04, 2002 at 01:21:25PM -0800, John Polstra wrote: > > > I'm trying to understand the timecounter code, and in particular the > > > reason for the "microuptime went backwards" messages which I see on > > > just about every machine I have, whether running -stable or -current. > > > > I see them everywhere with -CURRENT, but not at all with -STABLE. This is > > with two seperate machines. Perhaps that may add clues. > > I'm looking for something less empirical than that. When somebody > says this problem is caused by too much interrupt latency, I assume > they have a mental model of what is going wrong when this excessive > latency occurs.
It's not necessarily caused by interrupt latency. Here's the assumption that's being made. There is a ring of timecounter structures, of some size. In testing, I've used sizes of a thousand or more, but still seen this problem. There is a pointer to the "current" timecounter structure. When the "current" time is updated, the following procedure is followed: - Find the "next" timecounter in the ring. - Update its contents with the new current time. - Move the "current" pointer. When one wishes to read the current time, one proceeds as follows: - Get the "current" pointer and save it locally. - Read the timecounter structure via the local "current" pointer. Since the operations on the "current" pointer are atomic, there is no need to lock the structure. There are a couple of possible problems with this mechanism. One is that the ring "catches up" with your saved copy of the "current" pointer, ie. inbetween fetching the pointer and reading the timecounter contents, the "next" pointer passes over you again in such a fashion that you get garbage out of the structure. Another is that there is a race between multiple updaters of the timecounter; if two parties are both updating the "next" timecounter along with another party trying to get the "current" time, this could cause corruption. All that interrupt latency will do is make the updates late; I can't actually see how it could cause corruption. Corruption has to be caused by mishandling of the timecounter ring in some fashion. Note that you can probably eliminate the ring loop theory by allocating a very large number of entries in the ring by setting NTIMECOUNTER (kern/kern_tc.c) higher. The structures are small; try 100,000 or so. If you can reproduce under these circumstances, try adding some checks to make sure the "current" timecounter pointer is behaving monotonically; just save the last timecounter pointer in microtime() et. al. Another test worth performing is to look at the tco_delta function for the timecounter and make sure that it returns a sane value, and one that doesn't behave out of synch with the interrupt handler that updates the timecounter proper. If you save the delta value in the timecounter and zero it when it's updated, you can catch this. You can rule this out by using getmicroptime() rather than microuptime(); it may return the same value twice, which isn't desirable, but that would be better than nothing. Hope this helps a bit. Regards, Mike To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message