Yes just a thread safe API for read is all one would reasonably expect. Writes to the DOM, sure I would provide thread locking around where I am doing the writes - I wouldn't expect the library to want to handle that. But reads...
"transaction locking for a group of related changes." I'm not making any changes, just presenting a view over xml. Can't you provide an extension that would lock the few methods are read only to the DOM but yet doing writes to the internal state? Yes? From: gzun...@googlemail.com [mailto:gzun...@googlemail.com] On Behalf Of Alasdair Thomson Sent: Tuesday, July 19, 2011 12:46 PM To: j-users@xerces.apache.org Subject: RE: DOM thread safety issues & disapearring children I think the relevant complaint is that the DOM isn't thread safe for read-only operations, which is counter-intuitive unless you have knowledge of the underlying implementation. I don't think anyone expects it to be thread safe for updating. I've tended to use JAXB to transform the XML to java objects, which I can then make sure are thread safe for read only operations. On Jul 19, 2011 5:26 PM, <kesh...@us.ibm.com<mailto:kesh...@us.ibm.com>> wrote: > In 99% of the use cases, locking the individual DOM objects/operations > would be the wrong level of granularity -- what you really need to prevent > unexpected results is transaction locking for a group of related changes. > That really does have to be done at the application level. > > Locking every individual operation also can have significant performance > impact, in these days of multikernel/multiprocessor machines, due to the > need to flush cache in order to make sure all the processors know the > lock's state has changed. The days of "synchronize is free" really are > over. > > Also, frankly, I would be reluctant to encourage people to rely on a > protected DOM since if/when they change platforms their code will break > unexpectedly. > > If you really want locks on every operation, you're free to build a > "threadsafe DOM manipulation" library which provides threadsafety -- eg > static Threadsafe.appendChild(Node parent, Node newChild). The code won't > look exactly like a simple DOM call, but in most JVMs this kind of simple > "tail call" is pretty efficient, it makes what you're doing explicit, and > it's portable to any DOM you care to throw at it. > > > ______________________________________ > "You build world of steel and stone > I build worlds of words alone > Skilled tradespeople, long years taught: > You shape matter; I shape thought." > (http://www.songworm.com/lyrics/songworm-parody/ShapesofShadow.html) > > > > From: > "Newman, John W" <newma...@d3onc.com<mailto:newma...@d3onc.com>> > To: > "j-users@xerces.apache.org<mailto:j-users@xerces.apache.org>" > <j-users@xerces.apache.org<mailto:j-users@xerces.apache.org>> > Date: > 07/19/2011 11:49 AM > Subject: > RE: DOM thread safety issues & disapearring children > > > > I just wanted to follow up and say that switching off deferred parsing did > not add any stability. Still the same issues, same steps to reproduce > everytime. And yes it is actually on, my elements did change from > DeferredElementImpl to ElementImpl. So no dice there. > > Also setting the node to readonly didn't work either, but I didn't really > expect it to since that's for modifying the DOM and not clobbering the > underlying unsafe state. > > Let me ask you this then .. since clearly there is an industry need for a > thread safe document (like the one person said this comes up ALL the > time), and you are pretty clear that the default implementation will > always remain unsafe - "it's up to the caller to add the locks", why > can't you provide a simple extension of the current implementation that > properly takes care of the syncing at a higher level but in-between the > caller and the unsafe impl? Effectively do what I'm scrambling to do > correctly and offload this burden... > > class ThreadSafeDeferredElementImpl extends DeferredElementImpl { > @Override > public void iAmNotTotallySureWhatMethodsNeedSycned() { > synchronzized (this.something) { > return this.something.whatever(); > } > } > } > > Isn't that easy to do and win win? You can maintain your "it's not thread > safe stance", and users that don't require thread safety will still have > the good performance, but users like us that essentially need to ditch > your library can have the syncing properly taken care of and out of our > code. Why not just do that? > > Thanks, > John > > > > From: Michael Glavassevich > [mailto:mrgla...@ca.ibm.com<mailto:mrgla...@ca.ibm.com>] > Sent: Wednesday, June 08, 2011 12:06 AM > To: j-users@xerces.apache.org<mailto:j-users@xerces.apache.org> > Subject: Re: DOM thread safety issues & disapearring children > > Hi John, > > None of Xerces' DOM implementations are thread-safe, even the non-deferred > ones. This is true even for read operations. In particular, the > implementation of the NodeList methods (i.e. item() and getLength()) are > not thread-safe. These methods do some internal writes to a cache which > are necessary for good performance. There's a longer explanation in the > JIRA issue you found. > > Thanks. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: mrgla...@ca.ibm.com<mailto:mrgla...@ca.ibm.com> > E-mail: mrgla...@apache.org<mailto:mrgla...@apache.org> > > "Newman, John W" <newma...@d3onc.com<mailto:newma...@d3onc.com>> wrote on > 06/07/2011 04:17:22 PM: > >> All, >> >> My team has built a web application that has a few components that >> rely heavily on the xerces DOM implementation. Unfortunately a lot >> of this was developed before we even learned that the node >> implementations are deliberately not thread safe. =) I've added a >> few sync blocks where appropriate (I think), and everything is >> functioning ok except for two remaining issues. >> >> 1) Under very high load, an element will somehow lose nearly >> all of its children. >> <root> >> <ch0 /> >> <ch0 /> >> <ch0 /> >> <ch1 /> >> <ch1 /> >> <ch2><ch2.1 /></ch2> >> <ch3><ch3.1><ch3.2 /></ch3.1></ch3> >> .... Rather large document, many levels of nesting >> </root> >> >> That will sit there and work fine for a few days, until something >> (?) happens and most of the children will disappear . I cannot >> isolate this problem at all, it has been very difficult to track >> anything down so I'm asking for help. There are no exceptions in >> the log or anything otherwise to indicate that something bad is >> happening. One day I saw >> >> <root> >> <ch0 /> >> <ch0 /> >> </root> >> >> And then a few days later >> >> <root> >> <ch0 /> >> <ch0 /> >> <ch0 /> >> <ch1 /> >> </root> >> >> The fact that there doesn't seem to be any pattern to which children >> stay vs. which disappear, and only under higher load has me >> suspecting thread safety. In general it seems like the smaller >> elements at the top are more likely to hang around, but again >> there's no real pattern. We are not doing any modification on these >> nodes, in fact I want to make them read only. I'm debating on >> casting the org.w3c.Node to org.apache.xerces.dom.NodeImpl and >> calling setReadOnly(true, true) on it to freeze it - but the >> javadoc says I probably shouldn't need that method? If I did that, >> I'd at least get a stack trace when whatever it is decides to modify >> it. Does that sound like a good approach? Is there anything >> obvious that would cause this problem, e.g. has anyone ran into this >> before? Am I missing a sync? I'm about stumped. >> >> 2) Also under high load, I occasionally get this stack trace >> (this is not the cause of or symptom of item 1, it is a separate >> issue occurring at separate times) >> >> java.lang.NullPointerException: >> (no message) >> at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source) >> at org.apache.xerces.dom.ParentNode.item(Unknown Source) >> at freemarker.ext.dom.NodeListModel.<init>(NodeListModel.java:89) >> at freemarker.ext.dom.NodeModel.getChildNodes(NodeModel.java:302) >> at freemarker.ext.dom.ElementModel.get(ElementModel.java:124) >> at freemarker.core.Dot._getAsTemplateModel(Dot.java:76) >> >> Again I'm suspecting thread safety and a missing sync. Just >> refreshing the page works ok. I raised the issue with freemarker >> since it's only their stack frames calling the DOM, so I figured the >> burden falls on them to sync. But they passed the puck back to me >> and said 'we do not guarantee thread safety if your data model is >> not thread safe to begin with.' They're not going to go and add >> sync blocks all over their code due to an implementation artifact of >> your library, and I would agree with that. Really the lack of >> thread safety even for reads is a pretty poor fit for a web >> application... How do I fix this problem, in general is there a way >> to make this library more thread safe? The best suggestion I have >> for that stack trace so far is to use CGLib to proxy the element and >> inject sync blocks where they should be. Ugh... https:// >> issues.apache.org/jira/browse/XERCESJ-727<http://issues.apache.org/jira/browse/XERCESJ-727> >> is relevant here >> >> What about calling documentBuilderFactory.setFeature("http:// >> apache.org/xml/features/dom/defer-node-expansion<http://apache.org/xml/features/dom/defer-node-expansion>", >> false); to turn >> off lazy parsing? Does that guarantee thread safety since >> everything is already parsed into ram and it's just read only? >> >> >> Any input is very much appreciated, these issues are affecting > production. K >> >> Thanks, >> John >