On Thu, Mar 15, 2012 at 7:05 AM, Alan M. Carroll < a...@network-geographics.com> wrote:
> Wednesday, March 14, 2012, 9:29:43 PM, John Plevyak wrote: > > > My view is that this is only one of many failure modes, albeit the most > > common one. > > I disagree because only in the close case is the lock itself de-allocated. > In all other cases the locks continue to be valid. So while all the other > modes can be synchronized via locks, closing the NetVC cannot be. Before I > started making fixes the crashes were almost all at the point of accessing > the lock, not the NetVC itself. > The lock is reference counted. It cannot be de-allocated while it is still in use. It is only de-allocated after the close() by which time all references to that NetVC should have been dropped by the client. > > > If the locking was working, then the client would clear all > > pointers to the netvc > > I am still failing to see how, in my example timeline, the client in > thread A can cause the client in thread B to drop its NetVC pointer, or > even detect the fact that there is a pointer in thread B. Even if the > operations are completely temporally disjoint (the point of locking) it > will crash when thread B accesses the invalid lock or NetVC. No > simultaneous access is required in the example scenario. > > The entire point of the reference counting in the patch is to provide that > detection mechanism, so that thread A can in fact wait for the thread B > client to drop its pointers. I don't see how that can be done with only > locks if the locks themselves can become dangling pointers. > Each transaction should have one (1) lock. When holding that lock all pointers held by that transaction should be accessible. Only the transaction has the power to close() the NetVC. Before doing so the transaction must drop all references to the NetVC. The lock does need to be reference counted, because all threads which might call the transaction hold a pointer to the lock and an Action (lock + cancel boolean) and take the lock and then check the "cancel" flag before calling the transaction. If the transaction holds the lock and cancels all outstanding operations then it is free to release the lock and drop its outstanding reference to the lock safe in the knowledge that it can't be activated by a stray thread. This is the procedure that make ATS rock solid from 1997 till some bug was introduced. Reference counting the NetVC is a bandaid for an undiscovered bug.