Hi,

If you were to use XNI (or SAX for that matter) you're not really writing 
a parser. Your DOM builder would be an event handler which responds to 
callbacks to build the DOM.

The Xerces DOM builder isn't lazy. It reads the whole XML document into 
memory before returning control to the user. It's the materialization of 
nodes in the deferred DOM implementation which is lazy. Instead of 
building the DOM using the standard API methods, Xerces calls methods 
specific to the deferred DOM to build table like structures (stored within 
the Document node) which are more compact than actual DOM node objects. As 
the user walks the DOM tree these tables are read to fill in the node 
objects in the tree. It does improve memory usage when only a fraction of 
the tree is accessed, but it's nowhere near as memory efficient as an 
implementation which lazily loads data from the parser or is able to 
unload nodes like you suggest.

If you want your DOM implementation to pull data from the parser lazily 
you could try using an XMLPullParserConfiguration [1] or a standard API 
like StAX (not supported by Xerces) which can be used to incrementally 
parse the document.

Thanks.

[1] 
http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache/xerces/xni/parser/XMLPullParserConfiguration.html

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

Dominik Rauch <e0825...@student.tuwien.ac.at> wrote on 04/10/2012 03:13:38 
PM:

> Hi Michael!
> 
> XNI sounds neat, but do you really think we need to write our own parser 
as
> well? We want to parse XML just as everybody else does, we just want it 
to
> do that lazily - like the lazy Xerces parser. It would just be importatn 
to
> be able to reload some parts later on in case we've dismissed it from 
the
> memory in order to lower RAM usage.
> 
> Is that not possible with the currrent Xerces parser? How does the 
current
> lazy implementation work then?
> 
> Best regards,
> D.R.
> 
> PS: Now posting from OldNabble instead of Outlook, hope this helps.
> 
> 
> 
> Michael Glavassevich-3 wrote:
> > 
> > Hi,
> > 
> > Have you had a read through the XNI manual [1]? This is the framework 
on 
> > which all parsers in Xerces are built-on.
> > 
> > You should be able to reuse much of that infrastructure, in particular 
the 
> > existing XMLParserConfigurations [2] which are the real guts of a 
parser 
> > in Xerces.
> > 
> > Thanks.
> > 
> > [1] http://xerces.apache.org/xerces2-j/xni.html
> > [2] http://xerces.apache.org/xerces2-j/faq-xni.html#faq-3
> > 
> > Michael Glavassevich
> > XML Technologies and WAS Development
> > IBM Toronto Lab
> > E-mail: mrgla...@ca.ibm.com
> > E-mail: mrgla...@apache.org
> > 
> > "Dominik Rauch" <e0825...@student.tuwien.ac.at> wrote on 27/09/2012 
> > 04:25:09 PM:
> > 
> >> Hello Xerces-List!
> >> 
> >> We’re currently thinking about writing an advanced lazy DOM 
> >> implementation compliant with the W3C DOM specification.
> >> We know that there is already a Xerces lazy-loading-solution, 
> >> however, it is never unloading nodes, which becomes a problem for 
> >> very big DOM trees which do not fit into memory.
> >> 
> >> There are some ideas and/or commercial products (like xDB), however,
> >> no open-source solution yet.
> >> 
> >> We want to know if it is possible to replace the Xerces DOM parser 
> >> with our own lazy implementation and reuse all the XPath/etc. 
> >> features from Xerces or if we need to write everything from scratch.
> >> 
> >> Hopefully you can give us a positive answer and maybe show us the 
> >> main extension points where we would have to fit in our 
> >> implementation (e.g. classes/packages we would have to re-implement 
> >> / derive / etc.)
> >> 
> >> 
> >> Best regards,
> >> D.R.
> >> Technical University of Vienna
> > 
> > 
> 
> -- 
> View this message in context: http://old.nabble.com/Lazy-DOM-with-
> regards-to-memory-usage-tp34488915p34513281.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to