Re: TS-857 and finer grained locking

John Plevyak Sat, 17 Mar 2012 20:33:53 -0700

On Thu, Mar 15, 2012 at 11:01 AM, Alan M. Carroll <
a...@network-geographics.com> wrote:

> Thursday, March 15, 2012, 9:26:33 AM, you wrote:
>
> > [The lock] is only de-allocated after the close() by which time all
> > references to that NetVC should have been dropped by the client.
>
> Yes, I understand. What I do not understand is what actual, specific,
> implementation mechanism I can use to make sure "all references to that
> NetVC should have been dropped by the client". Re-iterating that point over
> and over does not provide the mechanism needed to implement it.
>

We must be having some fundamental communication issue.  The way I was
thinking was:

VConnection *temp_vc = m_net_vc;
*m_net_vc = NULL;
temp_vc->do_io_close();

In other words, this is precisely like anything heap allocated.  You need
to drop all references to the memory when you free it.  do_io_close() is
just like free.  It means that you can't use that thing anymore, it is
dead.  This is C++, not java, so this should be a very common situation and
one that every one of us has dealt with countless times.

>
> > Each transaction should have one (1) lock.  When holding that lock all
> > pointers held by that transaction should be accessible.
>
> As I have pointed out, that is not the case in the current implementation
> with regard to HttpServerSession being split across threads. How can I
> change the code to implement that design goal?
>

It absolutely is the case.  The transaction is controlled by a single
mutex.  When that transaction is activated by any thread, that thread has
taken the mutex at which point the transaction has complete control over
all the memory for that transaction, including ALL copies of the net_vc
pointer outside of the net processor which spawned it.   That thread must
take that same mutex and once it has it, it checks the "closed" flag which
is set by do_io_close BEFORE calling the transaction.  Therefore 1) when
you hold the mutex you have access to ALL copies of the NetVC pointer that
need to be cleared and 2) once you call do_io_close while holding the mutex
it is IMPOSSIBLE for that thread to call you.  This is how it has worked
since 1996.   Now, there is probably a bug somewhere.

In fact, I just found one when there is a timeout on the NetVC.   Someone
(after my time) added a "InactivityCop" which calls into the NetVC for
timeouts.  Now, the normal callbacks check that the mutex pointer hasn't
changed after they take the mutex.  This is required because the
HttpSessionManager can change that mutex, and if you don't check, it is
possible that the mutex you are holding is not the mutex that is currently
protecting the NetVC.  Unfortunately, that check is not done on the path
through the InactivityCop.

Please find attached a patch for this case.  I don't know that it will
solve the problems you are seeing.

>
> In my example timeline, derived from the current codebase, when thread A
> holds the lock it can't access the HttpServerSession pointer and when
> thread B holds the lock it can't access the pointer in the NetHandler. What
> is your proposal to change this to be compliance with your design
> specification?
>

HttpServerSession gets the transactions lock, therefore thread A can
absolutely access the entirety of it and it must.   HttpServerSession is
also not "running" on a thread.  The NetVC it represents is running on a
thread and that NetVC has the transactions lock.   Furthermore, the
do_io_close does not clear the mutex.  It sets a flag which is checked by
the thread controlling the NetVC.  When that flag is true, the close
process is done on that thread. So your timeline doesn't make sense.  If
thread A is the one that owns the NetVC, then thread B can't call through
it to the continuation.  If thread B owns the NetVC then thread A can't
clear the mutex in the NetVC.

> Is the transaction lock distinct from the NetVC lock? Could you, with
> reference to the stack trace diagram I sent earlier, point out the
> transaction lock?
>

When the accept callback happens, a mutex is created.  This mutex
represents the transaction.  Everything in that transaction is protected by
that mutex, including both the client and server NetVC.  So the NetVC lock
is the same as the transaction lock.  The HttpSessionManager replaces the
NetVC mutex with the new transactions mutex when it hands a server session
to a transaction.  It replaces that mutex with one of its own when it is
given a server session which is no longer being used by a transaction.

>
> > Only the
> > transaction has the power to close() the NetVC.  Before doing so the
> > transaction must drop all references to the NetVC.
>
> For my example timeline / scenario, what should thread A and thread B do
> to implement this?
>

Only one of thread A or B can hold the transaction mutex.  Whichever one
that is clears the references and closes() the NetVC.

>
> > The  lock does need to
> > be reference counted, because all threads which might call the
> transaction
> > hold a pointer to the lock and an Action (lock + cancel boolean) and take
> > the lock and then check the "cancel" flag before calling the transaction.
>
> For HttpServerSession, where is this transaction lock? I also don't see
> where it keeps an Action. Is there an access mechanism or member I have
> missed?
>

HttpServerSession has the same lock as the NetVC it controls (see
HttpServerSession::new_connection
and HttpClientSession::attach_server_session()

>
> > Reference counting the NetVC is a bandaid for an undiscovered bug.
>
> What should I tell my client who is experiencing these crashes on deployed
> ATS servers? No bandaid for you because ATS has been rock solid since 1997?
> What can I tell him about your plan for making progress on this issue?
>
>
I actually agreed that a bandaid is better than nothing.  What I am
suggesting is that there is some deeper problem that may come back and bite
later.

john

Re: TS-857 and finer grained locking

Reply via email to