Re: Parsing large xml files

2009-05-24 Thread crackeur
yes, that is something worth thinking about thanks for bringing this up... - Original Message - From: "Michael Wechner" To: java-user@lucene.apache.org Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific Subject: Re: Parsing large xml files crack...@c

Re: Parsing large xml files

2009-05-22 Thread Michael Wechner
quot; To: java-user@lucene.apache.org Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific Subject: RE: Parsing large xml files Thanks everyone for your useful suggestions/links. Lucene uses DOM and we tried with SAX. XML Pull & vtd-xml as well as Piccolo seem good. Ho

Re: Parsing large xml files

2009-05-22 Thread crackeur
once you get comfortable with vtd-xml, few people will ever get back to DOM and SAX... - Original Message - From: "Sithu D. Sudarsan" To: java-user@lucene.apache.org Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific Subject: RE: Parsing large xml file

Re: Parsing large xml files

2009-05-22 Thread prasanna pradhan
We had similar a problem where we had to parse 1 GB XML files.Better transform to array like json and write a custom search API using lucene. On Thu, May 21, 2009 at 8:12 PM, Sudarsan, Sithu D. < sithu.sudar...@fda.hhs.gov> wrote: > > Hi, > > While trying to parse xml documents of about 50MB siz

Re: Parsing large xml files

2009-05-22 Thread Matthew Hall
ble to use 4GB though! If there is any setting that will let us use 4GB do let me know. Thanks, Sithu D Sudarsan -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Friday, May 22, 2009 8:59 AM To: java-user@lucene.apache.org Subject: Re: Parsing large xml files

RE: Parsing large xml files

2009-05-22 Thread Sudarsan, Sithu D.
arsan -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Friday, May 22, 2009 8:59 AM To: java-user@lucene.apache.org Subject: Re: Parsing large xml files 2g... should not be a maximum for any Jvm that I know of. Assuming you are running a 32 bit Jvm you are actually

RE: Parsing large xml files

2009-05-22 Thread Sudarsan, Sithu D.
o:michael.wech...@wyona.com] Sent: Friday, May 22, 2009 4:48 AM To: java-user@lucene.apache.org Subject: Re: Parsing large xml files crack...@comcast.net schrieb: > http://vtd-xml.sf.net > > > - Original Message - > From: "Sithu D. Sudarsan" > To: java-user@l

Re: Parsing large xml files

2009-05-22 Thread Matthew Hall
7:42:59 AM GMT -08:00 US/Canada Pacific Subject: Parsing large xml files Hi, While trying to parse xml documents of about 50MB size, we run into OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB (that is the max), does not help. Is there any API that could be used

Re: Parsing large xml files

2009-05-22 Thread Michael Wechner
crack...@comcast.net schrieb: http://vtd-xml.sf.net - Original Message - From: "Sithu D. Sudarsan" To: java-user@lucene.apache.org Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific Subject: Parsing large xml files Hi, While trying to parse xml

Re: Parsing large xml files

2009-05-21 Thread crackeur
http://vtd-xml.sf.net - Original Message - From: "Sithu D. Sudarsan" To: java-user@lucene.apache.org Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific Subject: Parsing large xml files Hi, While trying to parse xml documents of about 50MB size, w

RE: Parsing large xml files

2009-05-21 Thread Sudarsan, Sithu D.
Thanks, I'll try that and get back to you Sincerely, Sithu D Sudarsan -Original Message- From: Michael Barbarelli [mailto:mbarbare...@gmail.com] Sent: Thursday, May 21, 2009 10:52 AM To: java-user@lucene.apache.org Subject: Re: Parsing large xml files Why not use an XML pull p

Re: Parsing large xml files

2009-05-21 Thread Erick Erickson
What fails and what is the stack trace? Have you tried just parsing the XML in a stand-alone program independent of indexing? You should easily be able to parse a 50MB file with that much memory. I suspect something else is going on here. Perhaps you're not *really* allocating that much memory to

Re: Parsing large xml files

2009-05-21 Thread Joel Halbert
try http://piccolo.sourceforge.net/ is small and fast. -Original Message- From: Michael Barbarelli Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: Parsing large xml files Date: Thu, 21 May 2009 15:52:00 +0100 Why not use an XML pull parser? I

Re: Parsing large xml files

2009-05-21 Thread Michael Barbarelli
Why not use an XML pull parser? I recommend against using an in-memory parser. On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. < sithu.sudar...@fda.hhs.gov> wrote: > > Hi, > > While trying to parse xml documents of about 50MB size, we run into > OutOfMemoryError due to java heap space. Incre

Parsing large xml files

2009-05-21 Thread Sudarsan, Sithu D.
Hi, While trying to parse xml documents of about 50MB size, we run into OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB (that is the max), does not help. Is there any API that could be used to handle such large single xml files? If Lucene is not the right place, please l