yes, that is something worth thinking about .... thanks for bringing this up... 
----- Original Message ----- 
From: "Michael Wechner" <michael.wech...@wyona.com> 
To: java-user@lucene.apache.org 
Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific 
Subject: Re: Parsing large xml files 

crack...@comcast.net schrieb: 
> once you get comfortable with vtd-xml, few people will ever get back to DOM 
> and SAX... 
>   

maybe you want to consider to contribute a vtd-xml based parsing 
implementation to Lucene ;-) 

Thanks 

Michael 
> ----- Original Message ----- 
> From: "Sithu D. Sudarsan" <sithu.sudar...@fda.hhs.gov> 
> To: java-user@lucene.apache.org 
> Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific 
> Subject: RE: Parsing large xml files 
> 
> Thanks everyone for your useful suggestions/links. 
> 
> Lucene uses DOM and we tried with SAX. 
> 
> XML Pull & vtd-xml as well as Piccolo seem good. 
> 
> However, for now, we've broken the file into smaller chunks and then 
> parsing it. 
> 
> When we get some time, we'ld like to refactor with the suggested ones. 
> 
> Erick: We do use Eclipse. But running from CLI gives the same error! May 
> be there is a way to address the memory issues, but the current idea of 
> breaking into smaller chunks have worked for now... 
> 
> 
> Sincerely, 
> Sithu D Sudarsan 
> 
> -----Original Message----- 
> From: Michael Wechner [mailto:michael.wech...@wyona.com] 
> Sent: Friday, May 22, 2009 4:48 AM 
> To: java-user@lucene.apache.org 
> Subject: Re: Parsing large xml files 
> 
> crack...@comcast.net schrieb: 
>   
>> http://vtd-xml.sf.net 
>> 
>> 
>> ----- Original Message ----- 
>> From: "Sithu D. Sudarsan" <sithu.sudar...@fda.hhs.gov> 
>> To: java-user@lucene.apache.org 
>> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific 
>> Subject: Parsing large xml files 
>> 
>> 
>> Hi, 
>> 
>> While trying to parse xml documents of about 50MB size, we run into 
>> OutOfMemoryError due to java heap space. Increasing JVM to use close 
>>     
> 2GB 
>   
>> (that is the max), does not help. Is there any API that could be used 
>>     
> to 
>   
>> handle such large single xml files? 
>>   
>>     
> 
> I am not familiar with that particular code of Lucene, but is it 
> possible that Lucene is using DOM for this parsing? 
> If so, one could try to replace it by SAX, and hence get rid of the 
> OutOfMemory issue. 
> 
> Cheers 
> 
> Michael 
>   
>> If Lucene is not the right place, please let me know alternate places 
>>     
> to 
>   
>> look for, 
>> 
>> Thanks in advance, 
>> Sithu D Sudarsan 
>> sithu.sudar...@fda.hhs.gov 
>> sdsudar...@ualr.edu 
>> 
>> 
>> 
>> 
>>   
>>     
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org 
> For additional commands, e-mail: java-user-h...@lucene.apache.org 
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org 
> For additional commands, e-mail: java-user-h...@lucene.apache.org 
> 
> 
>   


--------------------------------------------------------------------- 
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org 
For additional commands, e-mail: java-user-h...@lucene.apache.org 

Reply via email to