Re: Big size xml file indexing

aslam bari Sun, 21 Jan 2007 21:46:23 -0800

Hi Saikrishna,
Unluckily my xml structure is not the same, some times it goes too long and 
some times too small on nodes. It may be one element go throught the whole 
document or there may be many elements of different types come. So need your 
help on it how to parse in good and efficient way so that less memory use and 
fast processing.


i read that SAXBuilder is slower and memory consuming. How to replace it with 
other, like my code is :-

SAXBuilder builder  = new SAXBuilder();

//content is ByteArrayinputstram, Can i change it to any better way
(JDOM type ) Document doc = builder.build(content);



for loop()
{
    // it is for getting nodes for given many xpath query
    XPATH xpath = new XPATH(...);
     xpath.selectNodes(doc);
...................................
}
 
Thanks...

----- Original Message ----
From: saikrishna venkata pendyala <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, 22 January, 2007 10:44:50 AM
Subject: Re: Big size xml file indexing


Hai ,
       Nothing to change in Indexing process. What requires is a little
pre-processing.
       If the structure of ur xml file is same as what I said earlier,then
split the 35MB file into small files and make sure that new small files
generated are of correct xml syntax.
       Now Index small files{more than one} generated instead of one large
file.

       Could you say the sturcture of ur xml file and what ur trying to
index.

On 1/22/07, aslam bari <[EMAIL PROTECTED]> wrote:
>
> Hi Saikrishna,
> Thanks for reply,
> But i don't know how i can go with this. Here is my code sample, let me
> know where to change.
>
> SAXBuilder builder = new SAXBuilder();
>
> //CONTENT here is bytearrayinputstream , i know i can give here file url
> also. Let me know whta is best.
> Document doc = builder.build(CONTENT);
>
> loop(---)
> {
>     doc.selectNodes(xpathquery);
> }
>
> Thanks...
> ----- Original Message ----
> From: saikrishna venkata pendyala <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Monday, 22 January, 2007 10:07:27 AM
> Subject: Re: Big size xml file indexing
>
>
> Hai ,
>        I have indexed 6.2 gb xml file using lucene. What I did was
>         1 .  I have splitted the 6.2gb file into small files each of size
> 10mb.
>         2 .  And then I worte a python script to quantize number
> no.ofdocuments in each file.
>
>         Structure of my xml file is """
>        <document>
>         -----
>         -----
>         </document>
>         <document>
>         -----
>         -----
>         </document> """
>
> Since you cannot go beyond 500MB this technique might help you of course
> if
> file sturcture is the same.
>
> On 1/22/07, aslam bari <[EMAIL PROTECTED]> wrote:
> >
> > Dear all,
> > I m using lucene to index xml files. For parsing i m using JDOM to get
> > XPATH nodes and do some manipulation on them and indexed them. All
> things
> > work well but when the file size is very big about 35 - 50 MB. Then it
> goes
> > out of memory or take a lot of time. How can i set some parameters to
> speed
> > up and took less memory to parse the file. The problem is that i cannot
> > increase much high Heap Size. So i have to limit to use heap size of 300
> -
> > 500 MB. Has anybody some solution for this.
> >
> > Thanks...
> >
> >
> >
> > __________________________________________________________
> > Yahoo! India Answers: Share what you know. Learn something new
> > http://in.answers.yahoo.com/
> >
>
>
>
> __________________________________________________________
> Yahoo! India Answers: Share what you know. Learn something new
> http://in.answers.yahoo.com/
>


                
__________________________________________________________
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/

Re: Big size xml file indexing

Reply via email to