Thanks Michael.  That's the response I've been waiting for.  This whole 
situation is really unfortunate, since it's not even my code that is missing 
the required locking, and the developers of that faulting code have pretty 
decent justification for refusing to add it.  I'll try to push back on them a 
little more for adding an extension since this the xerces dom is really the 
default.  I am not the only one affected by this, anyone using the dom package 
in that library without swapping the default implementation will run into this. 
 It's just so rare and such a lucky situation that I'm probably the first to 
notice it.
There's really nothing I can do besides some sort of wrapper or proxy solution, 
a massive document pool, OR a larger re-architecting effort <- .  Maybe I can 
come up with something clean and quick, but without a thread dump at the 
instant when this situation occurs, I can't get it under test, and can't come 
up with the fix...
I guess my question is, if there's a simple answer to this, what specific 
methods of your library can cause volatility?  Is it just NodeList.length() and 
NodeList.item(), any specific others, or ALL of them?    Those two are always 
the ones I've already ran into and have syncs around where my code is using 
them, it wasn't too hard to get some NPEs without the syncs.  But without them 
I never noticed corrupt documents.

Thanks again, I'm glad this has turned into a healthy discussion.



From: Michael Glavassevich [mailto:mrgla...@ca.ibm.com]
Sent: Thursday, June 09, 2011 12:14 PM
To: j-users@xerces.apache.org
Subject: RE: DOM thread safety issues & disapearring children


"Newman, John W" <newma...@d3onc.com<mailto:newma...@d3onc.com>> wrote on 
06/08/2011 01:24:38 PM:

> I've thought about document pooling but I don't think that's very
> scalable.  These elements themselves are about 500k, the document
> has about 20 of these elements.  And I have ~16 organizations each
> with their own document.   We're already using enough ram to run
> this, I'd rather not have to setup a pool and keep growing the pool
> size as we add users, not very efficient to have copies of
> everything. If the documents were small snippets pooling would be
> fine, but I think they're a little too big to scale like that.  I'll
> keep that in my pocket as  a last resort kind of thing.
>
> So do you think the children disappearing is caused by unsafe state?
>
> Have you ever seen this before and know how to reproduce it in a
> test?  Or is there no way two threads would drop the children and
> I've got something else going on?

I could imagine that might occur with the deferred DOM implementation. I 
believe the tables which store the data for the deferred nodes have a reference 
count on them. If multiple threads are hitting it, perhaps the ref count is 
hitting 0 and portions of those tables are being thrown away before the nodes 
which would be created from them are actually instantiated. I have never 
observed that myself, but all sorts of wacky things can happen if you haven't 
synchronized your code.

The caches I've mentioned that get used internally when you traverse a NodeList 
have a tendency to jump around to other parent nodes in the DOM tree. So when 
you're "lucky" enough that the unsynchronized access isn't generating a 
NullPointerException you may find you're getting random data returned from 
other parts of the DOM (i.e. NodeLists which point to the children of some 
other node in the DOM).

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com<mailto:mrgla...@ca.ibm.com>
E-mail: mrgla...@apache.org<mailto:mrgla...@apache.org>

Reply via email to