the code i typed in out of haste turns out to be exactly the same
as the code that had this problem, modulo names.

> isilock is a variable set by the lock to tainted as ilock instead of lock.
> Having isilock=1 onlys happen After the lock has been acquired by someone.
> The lock is checked with a tas() which I assume works because everything
> is based on it.

isilock is tested in the branch not holding the ilock.  therefore a winning
cpu can be executing the acquire branch and any number of loosing cpus can be
executing the body of the if statement concurrently.  where is the timing
guarentee that if this happens, the winning cpu executes l->isilock = 1
and the cacheline holding l->isilock has been flushed to the loosing cpu
before the !l->isilock test has been run?

i don't think that one needs to invoke any of the spectacularly odd things
that happen on modern pcs to explain this.  (for example, seperate cores
running at different frequencies.  or, my personal favorite, smm interrupting
things for a couple hundred ms.)

> My guess is that you have the lock uninitialized (key is not what it should 
> be),
> so key has a bogus value and that is where your problems start.
> Zeroing the lock before using it should do the trick.

the lock is in the bss.  it has been zeroed by the linker.  in any event,
i don't think an uninitialized lock explains the behavior.  all we
need is good old-fashioned concurrency.

- erik


Reply via email to