RE: Lazy DOM with regards to memory usage

Dominik Rauch Thu, 27 Sep 2012 17:48:20 -0700

Thank you for your answer, to be honest, we're not sure about it.

Could you give us a short intro on how Xerces loads XML documents?
The DOMParser is using a specific XMLParserConfiguration to fill up an empty
Document until it has finished completely parsing a file and returns the
Document afterwards. However, how does the lazy-implementation work in that
case? We'd imagine that the Document would need some kind of reference back
to the parser to tell it "load more child items at node xyz"?

Of course it would be neat if we could simply extend the existing
DocumentImpl (or CoreDocumentImpl) class and override required methods and
afterwards all other parts of Xerces (like XPath, etc.) could work with it.

If the Xerces parser is building up the Document by using the
create*-methods on Document we could even do the required memory-swapping
within the Document class by opening temporary files or a database
connection (we're not sure about the actual implementation yet as this is
ongoing research). -However we're not sure how to additionally use lazy
loading to boost the performance in case the user is only reading very
little parts of the big XML file. Hope you can help us a little bit on that
part.

All the best,
D.R.

(hope you don't mind our MS Outlook usage...normally people on mailing lists
are not so fond of it)

-----Original Message-----
From: Jeff Greif [mailto:jeff.gr...@gmail.com] 
Sent: Donnerstag, 27. September 2012 22:47
To: j-users@xerces.apache.org; Dominik Rauch
Subject: Re: Lazy DOM with regards to memory usage

Would it be sufficient to specify your DOM's Document implementation class
to the JAXP DocumentBuilderFactory using the factory's setAttribute method?
See the first example here:

http://xerces.apache.org/xerces2-j/properties.html

On 9/27/2012 1:25 PM, Dominik Rauch wrote:
> Hello Xerces-List!
> 
>  
> 
> We're currently thinking about writing an advanced lazy DOM 
> implementation compliant with the W3C DOM specification.
> 
> We know that there is already a Xerces lazy-loading-solution, however, 
> it is never unloading nodes, which becomes a problem for very big DOM 
> trees which do not fit into memory.
> 
>  
> 
> There are some ideas and/or commercial products (like xDB), however, 
> no open-source solution yet.
> 
>  
> 
> We want to know if it is possible to replace the Xerces DOM parser 
> with our own lazy implementation and reuse all the XPath/etc. features 
> from Xerces or if we need to write everything from scratch.
> 
>  
> 
> Hopefully you can give us a positive answer and maybe show us the main 
> extension points where we would have to fit in our implementation (e.g.
> classes/packages we would have to re-implement / derive / etc.)
> 
>  
> 
>  
> 
> Best regards,
> 
> D.R.
> 
> Technical University of Vienna
> 
>  
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

RE: Lazy DOM with regards to memory usage

Reply via email to