I guess the question comes down to what kind of things are you going to do w/ this graph? How often are you updating links, etc? I can't say Lucene was designed for this kind of thing, but I am constantly amazed at what people use Lucene for, so I won't say it can't be done. I don't know how efficient it would be for doing things like PageRank or other graph algorithms, but I would be interesting in hearing more about what you have in mind.

Lucene doesn't do much caching at the document level, but that is fairly easy to implement, but it does bring some terms, etc. into memory, and you may have a look at the FieldCache.

-Grant

On Jan 15, 2008, at 7:17 AM, Cam Bazz wrote:

Hello;

I like to use lucene as a graph store. The graph representation is a list of
edges. Consider the code below:

       final int commitCount = 16 * 1024;
       final int numObj =  1024 * 1024;

       Analyzer analyzer = new KeywordAnalyzer();
FSDirectory directory = FSDirectory.getDirectory("c:\ \LuceneAdd"); IndexWriter writer = new IndexWriter(directory, analyzer, true);

       Document doc;
       long start = System.currentTimeMillis();

       Random r = new Random(System.currentTimeMillis());

       for(int i=0; i<numObj; i++) {
           doc = new Document();
           doc.add(new Field("srcKey", NumberTools.longToString(i),
Field.Store.YES, Field.Index.UN_TOKENIZED));
           doc.add(new Field("dstKey",
NumberTools.longToString(r.nextInt(numObj)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
           doc.add(new Field("linkKey",
NumberTools.longToString(r.nextInt(16)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
           doc.add(new Field("linkValue", NumberTools.longToString(
r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));

          writer.addDocument(doc);

           if(i%commitCount==0) {
                long now = System.currentTimeMillis();
                System.out.println(i + ":" + (now-start));
                start = now;
           }

       }

       writer.optimize();
       writer.close();
       directory.close();


Basically I am adding a large number of documents from srcKey = i to dstKey
= random and two other string fields - linkKey and linkValue.

Compared to a normal database store, or an oodbms such as perst or db4o -
lucene takes longer to index.
However, it is much faster in searching, finding, retrieving records.

I can make 16384 random lookups over 1Million entries in 0.8 seconds. This
is excellent time. (I have been benchmarking for a long time)

Typically, when number of objects in BTree based structure in an oodbms for
example increase, the search and add times also increase.

Will lucene have the same problem and how can I overcome it if it does.
Looking at the above code - does anyone has any recomendations
to improve index performance. (also what can I do to improve search
performance)

While searching with an indexsearcher - does lucene do any caching? usually
MRU caches are used to accomplish this.

Any ideas,help,recomendations greatly appreciated.

Best,
-C.B.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to