Hai , I have indexed 6.2 gb xml file using lucene. What I did was 1 . I have splitted the 6.2gb file into small files each of size 10mb. 2 . And then I worte a python script to quantize number no.ofdocuments in each file.
Structure of my xml file is """ <document> ----- ----- </document> <document> ----- ----- </document> """ Since you cannot go beyond 500MB this technique might help you of course if file sturcture is the same. On 1/22/07, aslam bari <[EMAIL PROTECTED]> wrote:
Dear all, I m using lucene to index xml files. For parsing i m using JDOM to get XPATH nodes and do some manipulation on them and indexed them. All things work well but when the file size is very big about 35 - 50 MB. Then it goes out of memory or take a lot of time. How can i set some parameters to speed up and took less memory to parse the file. The problem is that i cannot increase much high Heap Size. So i have to limit to use heap size of 300 - 500 MB. Has anybody some solution for this. Thanks... __________________________________________________________ Yahoo! India Answers: Share what you know. Learn something new http://in.answers.yahoo.com/