yes, that is something worth thinking about .... thanks for bringing this up... 
----- Original Message ----- 
From: "Michael Wechner" <[email protected]> 
To: [email protected] 
Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific 
Subject: Re: Parsing large xml files 

[email protected] schrieb: 
> once you get comfortable with vtd-xml, few people will ever get back to DOM 
> and SAX... 
>   

maybe you want to consider to contribute a vtd-xml based parsing 
implementation to Lucene ;-) 

Thanks 

Michael 
> ----- Original Message ----- 
> From: "Sithu D. Sudarsan" <[email protected]> 
> To: [email protected] 
> Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific 
> Subject: RE: Parsing large xml files 
> 
> Thanks everyone for your useful suggestions/links. 
> 
> Lucene uses DOM and we tried with SAX. 
> 
> XML Pull & vtd-xml as well as Piccolo seem good. 
> 
> However, for now, we've broken the file into smaller chunks and then 
> parsing it. 
> 
> When we get some time, we'ld like to refactor with the suggested ones. 
> 
> Erick: We do use Eclipse. But running from CLI gives the same error! May 
> be there is a way to address the memory issues, but the current idea of 
> breaking into smaller chunks have worked for now... 
> 
> 
> Sincerely, 
> Sithu D Sudarsan 
> 
> -----Original Message----- 
> From: Michael Wechner [mailto:[email protected]] 
> Sent: Friday, May 22, 2009 4:48 AM 
> To: [email protected] 
> Subject: Re: Parsing large xml files 
> 
> [email protected] schrieb: 
>   
>> http://vtd-xml.sf.net 
>> 
>> 
>> ----- Original Message ----- 
>> From: "Sithu D. Sudarsan" <[email protected]> 
>> To: [email protected] 
>> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific 
>> Subject: Parsing large xml files 
>> 
>> 
>> Hi, 
>> 
>> While trying to parse xml documents of about 50MB size, we run into 
>> OutOfMemoryError due to java heap space. Increasing JVM to use close 
>>     
> 2GB 
>   
>> (that is the max), does not help. Is there any API that could be used 
>>     
> to 
>   
>> handle such large single xml files? 
>>   
>>     
> 
> I am not familiar with that particular code of Lucene, but is it 
> possible that Lucene is using DOM for this parsing? 
> If so, one could try to replace it by SAX, and hence get rid of the 
> OutOfMemory issue. 
> 
> Cheers 
> 
> Michael 
>   
>> If Lucene is not the right place, please let me know alternate places 
>>     
> to 
>   
>> look for, 
>> 
>> Thanks in advance, 
>> Sithu D Sudarsan 
>> [email protected] 
>> [email protected] 
>> 
>> 
>> 
>> 
>>   
>>     
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: [email protected] 
> For additional commands, e-mail: [email protected] 
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: [email protected] 
> For additional commands, e-mail: [email protected] 
> 
> 
>   


--------------------------------------------------------------------- 
To unsubscribe, e-mail: [email protected] 
For additional commands, e-mail: [email protected] 

Reply via email to