On Wed, Mar 14, 2012 at 8:22 AM, Alan M. Carroll < a...@network-geographics.com> wrote:
> > > There is, however, one situation where this simple and safe order of > events > > is not followed. That is connection sharing to origin servers. Here the > > situations starts the same, but when the client is done with the > connection > > it does not issue a do_io_close(), and this is where the problems can > begin. > > That's not my interpretation of the crashes. We tried various settings for > connection sharing to no observed effect on crashing type or frequency. In > fact all of the configurations I use for testing have connection sharing > disabled. > "As far as I can tell the problem arises when the VCs in a HttpServerSession are split across two threads" This can only occur when there is some connection sharing or if someone has introduced a thread switch in some other processor which triggers the OS connection. AFAIK the OS connection is initiated on the thread which has the client connection and thus, without connection sharing, they should be on the same thread. > > One might ask, why is HttpServerSession split across threads like that? I > have no idea. But it seems to happen much more with forward proxy (note: I > have only indirect evidence for that). > I question I have as well. This should not be the case and is going to cause performance problems. That said, it should not result in a crash. > > John Plevyak writes: > > So, this patch. What does it do? It uses smart points to prevent either > > of the two threads from making one particular change to the shared NetVC > > that they are currently scribbling all over: that of deleting it while > the > > other is still running. It doesn't prevent any of the other horrors, or > > all other manner of crashes, race conditions and unexpected behavior, > just > > the one, deallocating. It is a serious one, but not the only one. > > My current view is that this is the only problem, because in all other > cases the locking is working. > My view is that this is only one of many failure modes, albeit the most common one. If the locking was working, then the client would clear all pointers to the netvc and then call close() while holding the last pointer in local storage and the crashes you are seeing would be impossible as the netvc would be free'd by the owning thread and all pointers would have already been cleared. The only way there can be a crash is if two threads are holding the pointer in volatile memory, and the only way that can be happening is if the pointers are not cleared or if the locks are not working. john