Hi Saikrishna, Unluckily my xml structure is not the same, some times it goes too long and some times too small on nodes. It may be one element go throught the whole document or there may be many elements of different types come. So need your help on it how to parse in good and efficient way so that less memory use and fast processing.
i read that SAXBuilder is slower and memory consuming. How to replace it with other, like my code is :- SAXBuilder builder = new SAXBuilder(); //content is ByteArrayinputstram, Can i change it to any better way (JDOM type ) Document doc = builder.build(content); for loop() { // it is for getting nodes for given many xpath query XPATH xpath = new XPATH(...); xpath.selectNodes(doc); ................................... } Thanks... ----- Original Message ---- From: saikrishna venkata pendyala <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 22 January, 2007 10:44:50 AM Subject: Re: Big size xml file indexing Hai , Nothing to change in Indexing process. What requires is a little pre-processing. If the structure of ur xml file is same as what I said earlier,then split the 35MB file into small files and make sure that new small files generated are of correct xml syntax. Now Index small files{more than one} generated instead of one large file. Could you say the sturcture of ur xml file and what ur trying to index. On 1/22/07, aslam bari <[EMAIL PROTECTED]> wrote: > > Hi Saikrishna, > Thanks for reply, > But i don't know how i can go with this. Here is my code sample, let me > know where to change. > > SAXBuilder builder = new SAXBuilder(); > > //CONTENT here is bytearrayinputstream , i know i can give here file url > also. Let me know whta is best. > Document doc = builder.build(CONTENT); > > loop(---) > { > doc.selectNodes(xpathquery); > } > > Thanks... > ----- Original Message ---- > From: saikrishna venkata pendyala <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, 22 January, 2007 10:07:27 AM > Subject: Re: Big size xml file indexing > > > Hai , > I have indexed 6.2 gb xml file using lucene. What I did was > 1 . I have splitted the 6.2gb file into small files each of size > 10mb. > 2 . And then I worte a python script to quantize number > no.ofdocuments in each file. > > Structure of my xml file is """ > <document> > ----- > ----- > </document> > <document> > ----- > ----- > </document> """ > > Since you cannot go beyond 500MB this technique might help you of course > if > file sturcture is the same. > > On 1/22/07, aslam bari <[EMAIL PROTECTED]> wrote: > > > > Dear all, > > I m using lucene to index xml files. For parsing i m using JDOM to get > > XPATH nodes and do some manipulation on them and indexed them. All > things > > work well but when the file size is very big about 35 - 50 MB. Then it > goes > > out of memory or take a lot of time. How can i set some parameters to > speed > > up and took less memory to parse the file. The problem is that i cannot > > increase much high Heap Size. So i have to limit to use heap size of 300 > - > > 500 MB. Has anybody some solution for this. > > > > Thanks... > > > > > > > > __________________________________________________________ > > Yahoo! India Answers: Share what you know. Learn something new > > http://in.answers.yahoo.com/ > > > > > > __________________________________________________________ > Yahoo! India Answers: Share what you know. Learn something new > http://in.answers.yahoo.com/ > __________________________________________________________ Yahoo! India Answers: Share what you know. Learn something new http://in.answers.yahoo.com/