All,

My team has built a web application that has a few components that rely heavily 
on the xerces DOM implementation.  Unfortunately a lot of this was developed 
before we even learned that the node implementations are deliberately not 
thread safe. =)  I've added a few sync blocks where appropriate (I think), and 
everything is functioning ok except for two remaining issues.


1)      Under very high load, an element will somehow lose nearly all of its 
children.

<root>

  <ch0 />

  <ch0 />

  <ch0 />

  <ch1 />

  <ch1 />

  <ch2><ch2.1 /></ch2>

  <ch3><ch3.1><ch3.2 /></ch3.1></ch3>

  .... Rather large document, many levels of nesting

</root>



That will sit there and work fine for a few days, until something (?) happens 
and most of the children will disappear .  I cannot isolate this problem at 
all, it has been very difficult to track anything down so I'm asking for help.  
There are no exceptions in the log or anything otherwise to indicate that 
something bad is happening.  One day I saw


<root>

  <ch0 />

  <ch0 />

</root>



And then a few days later


<root>

  <ch0 />

  <ch0 />

  <ch0 />

  <ch1 />

</root>



The fact that there doesn't seem to be any pattern to which children stay vs. 
which disappear, and only under higher load has me suspecting thread safety.  
In general it seems like the smaller elements at the top are more likely to 
hang around, but again there's no real pattern.  We are not doing any 
modification on these nodes, in fact I want to make them read only.  I'm 
debating on casting the org.w3c.Node to org.apache.xerces.dom.NodeImpl and 
calling setReadOnly(true, true)  on it to freeze it - but the javadoc says I 
probably shouldn't need that method?  If I did that, I'd at least get a stack 
trace when whatever it is decides to modify it.  Does that sound like a good 
approach?  Is there anything obvious that would cause this problem, e.g. has 
anyone ran into this before?  Am I missing a sync?  I'm about stumped.



2)       Also under high load, I occasionally get this stack trace (this is not 
the cause of or symptom of item 1, it is a separate issue occurring at separate 
times)


java.lang.NullPointerException:
(no message)
at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
at org.apache.xerces.dom.ParentNode.item(Unknown Source)
at freemarker.ext.dom.NodeListModel.<init>(NodeListModel.java:89)
at freemarker.ext.dom.NodeModel.getChildNodes(NodeModel.java:302)
at freemarker.ext.dom.ElementModel.get(ElementModel.java:124)
at freemarker.core.Dot._getAsTemplateModel(Dot.java:76)



Again I'm suspecting thread safety and a missing sync.    Just refreshing the 
page works ok.  I raised the issue with freemarker since it's only their stack 
frames calling the DOM, so I figured the burden falls on them to sync.  But 
they passed the puck back to me and said 'we do not guarantee thread safety if 
your data model is not thread safe to begin with.'  They're not going to go and 
add sync blocks all over their code due to an implementation artifact of your 
library, and I would agree with that.  Really the lack of thread safety even 
for reads is a pretty poor fit for a web application...  How do I fix this 
problem, in general is there a way to make this library more thread safe?  The 
best suggestion I have for that stack trace so far is to use CGLib to proxy the 
element and inject sync blocks where they should be.  Ugh...  
https://issues.apache.org/jira/browse/XERCESJ-727 is relevant here



What about calling 
documentBuilderFactory.setFeature("http://apache.org/xml/features/dom/defer-node-expansion";,
 false);   to turn off lazy parsing?  Does that guarantee thread safety since 
everything is already parsed into ram and it's just read only?





Any input is very much appreciated, these issues are affecting production.  :|



Thanks,

John

Reply via email to