yes, that is something worth thinking about .... thanks for bringing this up... ----- Original Message ----- From: "Michael Wechner" <[email protected]> To: [email protected] Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific Subject: Re: Parsing large xml files
[email protected] schrieb: > once you get comfortable with vtd-xml, few people will ever get back to DOM > and SAX... > maybe you want to consider to contribute a vtd-xml based parsing implementation to Lucene ;-) Thanks Michael > ----- Original Message ----- > From: "Sithu D. Sudarsan" <[email protected]> > To: [email protected] > Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific > Subject: RE: Parsing large xml files > > Thanks everyone for your useful suggestions/links. > > Lucene uses DOM and we tried with SAX. > > XML Pull & vtd-xml as well as Piccolo seem good. > > However, for now, we've broken the file into smaller chunks and then > parsing it. > > When we get some time, we'ld like to refactor with the suggested ones. > > Erick: We do use Eclipse. But running from CLI gives the same error! May > be there is a way to address the memory issues, but the current idea of > breaking into smaller chunks have worked for now... > > > Sincerely, > Sithu D Sudarsan > > -----Original Message----- > From: Michael Wechner [mailto:[email protected]] > Sent: Friday, May 22, 2009 4:48 AM > To: [email protected] > Subject: Re: Parsing large xml files > > [email protected] schrieb: > >> http://vtd-xml.sf.net >> >> >> ----- Original Message ----- >> From: "Sithu D. Sudarsan" <[email protected]> >> To: [email protected] >> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific >> Subject: Parsing large xml files >> >> >> Hi, >> >> While trying to parse xml documents of about 50MB size, we run into >> OutOfMemoryError due to java heap space. Increasing JVM to use close >> > 2GB > >> (that is the max), does not help. Is there any API that could be used >> > to > >> handle such large single xml files? >> >> > > I am not familiar with that particular code of Lucene, but is it > possible that Lucene is using DOM for this parsing? > If so, one could try to replace it by SAX, and hence get rid of the > OutOfMemory issue. > > Cheers > > Michael > >> If Lucene is not the right place, please let me know alternate places >> > to > >> look for, >> >> Thanks in advance, >> Sithu D Sudarsan >> [email protected] >> [email protected] >> >> >> >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
