The core problem is that VC connection locking does not work. VCs are deallocated while still in use which leads to crashes. In fact, the general handling of VCs in NetHandler::mainNetEvent doesn't lock the VCs at all, so the cross thread access can't be safe. Moreover, because all the VC references are stored as raw pointers, there's no good way to safely access them across threads or even detect deallocation. I tried putting in generation numbers but that was insufficient (you can still get crashes because the virtual function pointer has been swizzled by the deallocator before you even get to the VC). All the crashes in the listed bugs stem from VC locking failures leading to cross thread corruption.
I've been at quite a loss on how to fix this. The two root causes, IMHO, are 1) VCs are ephemeral and stored purely by pointer, which means the accessor cannot have thread safe access. You can use pointers if your objects don't evaporate asynchronously. The use of raw pointers also means there's no way to detect dangling references. 2) Closing and deallocation are conflated, so that when a VC needs to be closed, it is also de-allocated. There are good reasons to get the close done ASAP but the de-allocation is not nearly a time critical. I am looking at using smart pointers to alleviate these two issues. However, that creates problems because the current queue mechanisms require the use of raw pointers. AFAICT the point of these is to minimize allocation on the fast path. The circular buffer is an attempt to have as little allocation impact as possible, otherwise I would just use a standard allocate per element style. Experimentally, the fix I did for TS-934 had a strong effect on reducing the problem, although it was not a complete solution. My attempt to expand it in TS-1031 did not end well, due to the lack of locks from NetHandler. At that point I was looking at a significant change, so I decided that I should try to do the best thing, rather than a minimalist hack which would not really be that much smaller a change. Friday, December 9, 2011, 11:38:47 AM, you wrote: > Could you be more specific about the problem (e.g. big numbers)? > VConnection locking works fine in all cases that I know of. There are > some issues with closing and deallocation with threading and the higher > layers, but these could be solved in a number of ways. Smart pointers is > one (and probably not a bad one) although it isn't a panacea, just a > mechanism. Similarly, a circular buffer queue is just a different way of > viewing the data structure. Sure, it has some advantages, particularly in > exposure to deallocation and locking, but it isn't a silver bullet either. > I'd like to understand the core problem.