On Thu, Mar 15, 2012 at 11:01 AM, Alan M. Carroll < a...@network-geographics.com> wrote:
> Thursday, March 15, 2012, 9:26:33 AM, you wrote: > > > [The lock] is only de-allocated after the close() by which time all > > references to that NetVC should have been dropped by the client. > > Yes, I understand. What I do not understand is what actual, specific, > implementation mechanism I can use to make sure "all references to that > NetVC should have been dropped by the client". Re-iterating that point over > and over does not provide the mechanism needed to implement it. > We must be having some fundamental communication issue. The way I was thinking was: VConnection *temp_vc = m_net_vc; *m_net_vc = NULL; temp_vc->do_io_close(); In other words, this is precisely like anything heap allocated. You need to drop all references to the memory when you free it. do_io_close() is just like free. It means that you can't use that thing anymore, it is dead. This is C++, not java, so this should be a very common situation and one that every one of us has dealt with countless times. > > > Each transaction should have one (1) lock. When holding that lock all > > pointers held by that transaction should be accessible. > > As I have pointed out, that is not the case in the current implementation > with regard to HttpServerSession being split across threads. How can I > change the code to implement that design goal? > It absolutely is the case. The transaction is controlled by a single mutex. When that transaction is activated by any thread, that thread has taken the mutex at which point the transaction has complete control over all the memory for that transaction, including ALL copies of the net_vc pointer outside of the net processor which spawned it. That thread must take that same mutex and once it has it, it checks the "closed" flag which is set by do_io_close BEFORE calling the transaction. Therefore 1) when you hold the mutex you have access to ALL copies of the NetVC pointer that need to be cleared and 2) once you call do_io_close while holding the mutex it is IMPOSSIBLE for that thread to call you. This is how it has worked since 1996. Now, there is probably a bug somewhere. In fact, I just found one when there is a timeout on the NetVC. Someone (after my time) added a "InactivityCop" which calls into the NetVC for timeouts. Now, the normal callbacks check that the mutex pointer hasn't changed after they take the mutex. This is required because the HttpSessionManager can change that mutex, and if you don't check, it is possible that the mutex you are holding is not the mutex that is currently protecting the NetVC. Unfortunately, that check is not done on the path through the InactivityCop. Please find attached a patch for this case. I don't know that it will solve the problems you are seeing. > > In my example timeline, derived from the current codebase, when thread A > holds the lock it can't access the HttpServerSession pointer and when > thread B holds the lock it can't access the pointer in the NetHandler. What > is your proposal to change this to be compliance with your design > specification? > HttpServerSession gets the transactions lock, therefore thread A can absolutely access the entirety of it and it must. HttpServerSession is also not "running" on a thread. The NetVC it represents is running on a thread and that NetVC has the transactions lock. Furthermore, the do_io_close does not clear the mutex. It sets a flag which is checked by the thread controlling the NetVC. When that flag is true, the close process is done on that thread. So your timeline doesn't make sense. If thread A is the one that owns the NetVC, then thread B can't call through it to the continuation. If thread B owns the NetVC then thread A can't clear the mutex in the NetVC. > Is the transaction lock distinct from the NetVC lock? Could you, with > reference to the stack trace diagram I sent earlier, point out the > transaction lock? > When the accept callback happens, a mutex is created. This mutex represents the transaction. Everything in that transaction is protected by that mutex, including both the client and server NetVC. So the NetVC lock is the same as the transaction lock. The HttpSessionManager replaces the NetVC mutex with the new transactions mutex when it hands a server session to a transaction. It replaces that mutex with one of its own when it is given a server session which is no longer being used by a transaction. > > > Only the > > transaction has the power to close() the NetVC. Before doing so the > > transaction must drop all references to the NetVC. > > For my example timeline / scenario, what should thread A and thread B do > to implement this? > Only one of thread A or B can hold the transaction mutex. Whichever one that is clears the references and closes() the NetVC. > > > The lock does need to > > be reference counted, because all threads which might call the > transaction > > hold a pointer to the lock and an Action (lock + cancel boolean) and take > > the lock and then check the "cancel" flag before calling the transaction. > > For HttpServerSession, where is this transaction lock? I also don't see > where it keeps an Action. Is there an access mechanism or member I have > missed? > HttpServerSession has the same lock as the NetVC it controls (see HttpServerSession::new_connection and HttpClientSession::attach_server_session() > > > Reference counting the NetVC is a bandaid for an undiscovered bug. > > What should I tell my client who is experiencing these crashes on deployed > ATS servers? No bandaid for you because ATS has been rock solid since 1997? > What can I tell him about your plan for making progress on this issue? > > I actually agreed that a bandaid is better than nothing. What I am suggesting is that there is some deeper problem that may come back and bite later. john