Re indexing performance, you are not making use of various IndexWriter 
parameters.  My suggestion: wait another week, Lucene 2.3 will be out then.  
Check IndexWriter javadocs for various knobs for improving indexing 
performance.  Actually, check the Wiki, there is a page about just that there.

Yes, the larger the index, the slower the search, of course, and your tests can 
show you when the speed becomes unacceptable for your type of index, hardware, 
search rate, etc.  What you do then is up to you.  You can spread the search 
load over multiple boxes, or you can split your index and do multiple searches 
in parallel over smaller indices sitting on multiple boxes.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Cam Bazz <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 7:17:49 AM
Subject: lucene as a graph store

Hello;

I like to use lucene as a graph store. The graph representation is a
 list of
edges. Consider the code below:

        final int commitCount = 16 * 1024;
        final int numObj =  1024 * 1024;

        Analyzer analyzer = new KeywordAnalyzer();
        FSDirectory directory =
 FSDirectory.getDirectory("c:\\LuceneAdd");
        IndexWriter writer = new IndexWriter(directory, analyzer,
 true);

        Document doc;
        long start = System.currentTimeMillis();

        Random r = new Random(System.currentTimeMillis());

        for(int i=0; i<numObj; i++) {
            doc = new Document();
            doc.add(new Field("srcKey", NumberTools.longToString(i),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("dstKey",
NumberTools.longToString(r.nextInt(numObj)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("linkKey",
NumberTools.longToString(r.nextInt(16)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
            doc.add(new Field("linkValue", NumberTools.longToString(
r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));

           writer.addDocument(doc);

            if(i%commitCount==0) {
                 long now = System.currentTimeMillis();
                 System.out.println(i + ":" + (now-start));
                 start = now;
            }

        }

        writer.optimize();
        writer.close();
        directory.close();


Basically I am adding a large number of documents from srcKey = i to
 dstKey
= random and two other string fields - linkKey and linkValue.

Compared to a normal database store, or an oodbms such as perst or db4o
 -
lucene takes longer to index.
However, it is much faster in searching, finding, retrieving records.

I can make 16384 random lookups over 1Million entries in 0.8 seconds.
 This
is excellent time. (I have been benchmarking for a long time)

Typically, when number of objects in BTree based structure in an oodbms
 for
example increase, the search and add times also increase.

Will lucene have the same problem and how can I overcome it if it does.
Looking at the above code - does anyone has any recomendations
to improve index performance. (also what can I do to improve search
performance)

While searching with an indexsearcher - does lucene do any caching?
 usually
MRU caches are used to accomplish this.

Any ideas,help,recomendations greatly appreciated.

Best,
-C.B.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to