I have been thinking about this a lot and I think Alan is right.   The
locking design of ATS is not appropriate
for an Apache project.   The design worked when it was a few full time
people sitting within feet of each
other all day (and chunks of the night) but the world has moved on and we
should move to a locking and threading
model which is more robust and fits with the expectations of a larger
fraction of the community.

I want to apologize for wrapping you all into my long running frustration
with trying to keep this brittle system stable
and open up discussion on how we can make it more stable, robust and easier
to develop in.

It is abundantly clear that if anyone has to go to the lengths that Alan
has been forced to to try to make this system
work under load that it is the systems fault.

So, here is my proposal.

The old locking system was based on TryLocks which could not be taken
forceably.  Moreover it depended on
very subtle knowledge of which bits of the various data structures where
protected by which locks.  This is clearly
not sustainable, nor is it necessary any longer.   Modern threading systems
work well with larger numbers of threads
and fine grain locking.

So, let's change to the more conventional model with fine grained locks
which protect data structures which are
clearly labeled with the lock that protects them and have external APIs
which enforce that protection.  These
locks will be just taken in the standard manner, and we will have to ensure
that the data structures are
sliced so as to minimize lock contention in the standard manner.

Let's also have a "Transaction" object (essentially our current Mutex with
additional tracking and book keeping)
and an explicit mechanism for associating resources owned by Processors
(e.g. NetVC) with a Transaction and for passing
resources (e.g. a NetVC) from one Transaction to another and for returning
the resource to the Processor when it is no longer required
(close/free/release).  We can also use proxy smart pointers and
encapsulation in debug mode to test that the ownership rules are being
obeyed correctly and that "stale" pointers are not being accessed (i.e.
after the resource has been released).

I believe that the problems we are currently seeing turn on an even more
subtle issue when the ownership of a NetVC
is passed from one transaction to another via the session manager.
Getting that code to work stably required many
a careful negotiation and resulted in something which is clearly very
brittle and not maintainable.

I hope that we work through this and end up with a system which is
substantially more maintainable easier to develop in.

Thanx
john


On Tue, Dec 13, 2011 at 10:30 PM, Alan M. Carroll <
a...@network-geographics.com> wrote:

> Tuesday, December 13, 2011, 7:00:42 PM, you wrote:
>
> >> > No other thread can call vc->do_io_close if they don't have the
> pointer
> >> > to it.
> >> Turns out that at least other thread does have a pointer.
> > Then they should not.
>
> Great. Now explain it to the compiler. Let me know when you've done that,
> I'll be going back to work on IPv6.
>
> > If you are really having a problem with this I am going to have to go
> back
> > through your checkins and see what changes might have been motivated by
> > such a fundamental lack of understanding of parallel programming.  This
> is
> > very worrying.
>
> I think if you are worried, you should definitely go back and check.
>
>
>

Reply via email to