yes, that is something worth thinking about thanks for bringing this up...
- Original Message -
From: "Michael Wechner"
To: java-user@lucene.apache.org
Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific
Subject: Re: Parsing large xml files
crack...@c
quot;
To: java-user@lucene.apache.org
Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific
Subject: RE: Parsing large xml files
Thanks everyone for your useful suggestions/links.
Lucene uses DOM and we tried with SAX.
XML Pull & vtd-xml as well as Piccolo seem good.
Ho
once you get comfortable with vtd-xml, few people will ever get back to DOM and
SAX...
- Original Message -
From: "Sithu D. Sudarsan"
To: java-user@lucene.apache.org
Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific
Subject: RE: Parsing large xml file
We had similar a problem where we had to parse 1 GB XML files.Better
transform to array like json and write a custom search API using lucene.
On Thu, May 21, 2009 at 8:12 PM, Sudarsan, Sithu D. <
sithu.sudar...@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB siz
ble to use 4GB though!
If there is any setting that will let us use 4GB do let me know.
Thanks,
Sithu D Sudarsan
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Friday, May 22, 2009 8:59 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
arsan
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Friday, May 22, 2009 8:59 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
2g... should not be a maximum for any Jvm that I know of.
Assuming you are running a 32 bit Jvm you are actually
o:michael.wech...@wyona.com]
Sent: Friday, May 22, 2009 4:48 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
crack...@comcast.net schrieb:
> http://vtd-xml.sf.net
>
>
> - Original Message -
> From: "Sithu D. Sudarsan"
> To: java-user@l
7:42:59 AM GMT -08:00 US/Canada Pacific
Subject: Parsing large xml files
Hi,
While trying to parse xml documents of about 50MB size, we run into
OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
(that is the max), does not help. Is there any API that could be used
crack...@comcast.net schrieb:
http://vtd-xml.sf.net
- Original Message -
From: "Sithu D. Sudarsan"
To: java-user@lucene.apache.org
Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
Subject: Parsing large xml files
Hi,
While trying to parse xml
http://vtd-xml.sf.net
- Original Message -
From: "Sithu D. Sudarsan"
To: java-user@lucene.apache.org
Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
Subject: Parsing large xml files
Hi,
While trying to parse xml documents of about 50MB size, w
Thanks, I'll try that and get back to you
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Michael Barbarelli [mailto:mbarbare...@gmail.com]
Sent: Thursday, May 21, 2009 10:52 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
Why not use an XML pull p
What fails and what is the stack trace? Have you tried just
parsing the XML in a stand-alone program independent of
indexing?
You should easily be able to parse a 50MB file with that much
memory. I suspect something else is going on here. Perhaps you're
not *really* allocating that much memory to
try http://piccolo.sourceforge.net/
is small and fast.
-Original Message-
From: Michael Barbarelli
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
Date: Thu, 21 May 2009 15:52:00 +0100
Why not use an XML pull parser? I
Why not use an XML pull parser? I recommend against using an in-memory
parser.
On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. <
sithu.sudar...@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Incre
Hi,
While trying to parse xml documents of about 50MB size, we run into
OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
(that is the max), does not help. Is there any API that could be used to
handle such large single xml files?
If Lucene is not the right place, please l
15 matches
Mail list logo