I have been thinking about this a lot and I think Alan is right. The locking design of ATS is not appropriate for an Apache project. The design worked when it was a few full time people sitting within feet of each other all day (and chunks of the night) but the world has moved on and we should move to a locking and threading model which is more robust and fits with the expectations of a larger fraction of the community.
I want to apologize for wrapping you all into my long running frustration with trying to keep this brittle system stable and open up discussion on how we can make it more stable, robust and easier to develop in. It is abundantly clear that if anyone has to go to the lengths that Alan has been forced to to try to make this system work under load that it is the systems fault. So, here is my proposal. The old locking system was based on TryLocks which could not be taken forceably. Moreover it depended on very subtle knowledge of which bits of the various data structures where protected by which locks. This is clearly not sustainable, nor is it necessary any longer. Modern threading systems work well with larger numbers of threads and fine grain locking. So, let's change to the more conventional model with fine grained locks which protect data structures which are clearly labeled with the lock that protects them and have external APIs which enforce that protection. These locks will be just taken in the standard manner, and we will have to ensure that the data structures are sliced so as to minimize lock contention in the standard manner. Let's also have a "Transaction" object (essentially our current Mutex with additional tracking and book keeping) and an explicit mechanism for associating resources owned by Processors (e.g. NetVC) with a Transaction and for passing resources (e.g. a NetVC) from one Transaction to another and for returning the resource to the Processor when it is no longer required (close/free/release). We can also use proxy smart pointers and encapsulation in debug mode to test that the ownership rules are being obeyed correctly and that "stale" pointers are not being accessed (i.e. after the resource has been released). I believe that the problems we are currently seeing turn on an even more subtle issue when the ownership of a NetVC is passed from one transaction to another via the session manager. Getting that code to work stably required many a careful negotiation and resulted in something which is clearly very brittle and not maintainable. I hope that we work through this and end up with a system which is substantially more maintainable easier to develop in. Thanx john On Tue, Dec 13, 2011 at 10:30 PM, Alan M. Carroll < a...@network-geographics.com> wrote: > Tuesday, December 13, 2011, 7:00:42 PM, you wrote: > > >> > No other thread can call vc->do_io_close if they don't have the > pointer > >> > to it. > >> Turns out that at least other thread does have a pointer. > > Then they should not. > > Great. Now explain it to the compiler. Let me know when you've done that, > I'll be going back to work on IPv6. > > > If you are really having a problem with this I am going to have to go > back > > through your checkins and see what changes might have been motivated by > > such a fundamental lack of understanding of parallel programming. This > is > > very worrying. > > I think if you are worried, you should definitely go back and check. > > >