Hai ,
      I have indexed 6.2 gb xml file using lucene. What I did was
       1 .  I have splitted the 6.2gb file into small files each of size
10mb.
       2 .  And then I worte a python script to quantize number
no.ofdocuments in each file.

       Structure of my xml file is """
      <document>
       -----
       -----
       </document>
       <document>
       -----
       -----
       </document> """

Since you cannot go beyond 500MB this technique might help you of course if
file sturcture is the same.

On 1/22/07, aslam bari <[EMAIL PROTECTED]> wrote:

Dear all,
I m using lucene to index xml files. For parsing i m using JDOM to get
XPATH nodes and do some manipulation on them and indexed them. All things
work well but when the file size is very big about 35 - 50 MB. Then it goes
out of memory or take a lot of time. How can i set some parameters to speed
up and took less memory to parse the file. The problem is that i cannot
increase much high Heap Size. So i have to limit to use heap size of 300 -
500 MB. Has anybody some solution for this.

Thanks...



__________________________________________________________
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/

Reply via email to