Thanks Michael. That's the response I've been waiting for. This whole situation is really unfortunate, since it's not even my code that is missing the required locking, and the developers of that faulting code have pretty decent justification for refusing to add it. I'll try to push back on them a little more for adding an extension since this the xerces dom is really the default. I am not the only one affected by this, anyone using the dom package in that library without swapping the default implementation will run into this. It's just so rare and such a lucky situation that I'm probably the first to notice it.
There's really nothing I can do besides some sort of wrapper or proxy solution, a massive document pool, OR a larger re-architecting effort <- . Maybe I can come up with something clean and quick, but without a thread dump at the instant when this situation occurs, I can't get it under test, and can't come up with the fix... I guess my question is, if there's a simple answer to this, what specific methods of your library can cause volatility? Is it just NodeList.length() and NodeList.item(), any specific others, or ALL of them? Those two are always the ones I've already ran into and have syncs around where my code is using them, it wasn't too hard to get some NPEs without the syncs. But without them I never noticed corrupt documents. Thanks again, I'm glad this has turned into a healthy discussion. From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com] Sent: Thursday, June 09, 2011 12:14 PM To: j-users@xerces.apache.org Subject: RE: DOM thread safety issues & disapearring children "Newman, John W" <newma...@d3onc.com<mailto:newma...@d3onc.com>> wrote on 06/08/2011 01:24:38 PM: > I've thought about document pooling but I don't think that's very > scalable. These elements themselves are about 500k, the document > has about 20 of these elements. And I have ~16 organizations each > with their own document. We're already using enough ram to run > this, I'd rather not have to setup a pool and keep growing the pool > size as we add users, not very efficient to have copies of > everything. If the documents were small snippets pooling would be > fine, but I think they're a little too big to scale like that. I'll > keep that in my pocket as a last resort kind of thing. > > So do you think the children disappearing is caused by unsafe state? > > Have you ever seen this before and know how to reproduce it in a > test? Or is there no way two threads would drop the children and > I've got something else going on? I could imagine that might occur with the deferred DOM implementation. I believe the tables which store the data for the deferred nodes have a reference count on them. If multiple threads are hitting it, perhaps the ref count is hitting 0 and portions of those tables are being thrown away before the nodes which would be created from them are actually instantiated. I have never observed that myself, but all sorts of wacky things can happen if you haven't synchronized your code. The caches I've mentioned that get used internally when you traverse a NodeList have a tendency to jump around to other parent nodes in the DOM tree. So when you're "lucky" enough that the unsynchronized access isn't generating a NullPointerException you may find you're getting random data returned from other parts of the DOM (i.e. NodeLists which point to the children of some other node in the DOM). Thanks. Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: mrgla...@ca.ibm.com<mailto:mrgla...@ca.ibm.com> E-mail: mrgla...@apache.org<mailto:mrgla...@apache.org>