Re: lucene as a graph store

Grant Ingersoll Tue, 15 Jan 2008 05:32:47 -0800

I guess the question comes down to what kind of things are you goingto do w/ this graph? How often are you updating links, etc? I can'tsay Lucene was designed for this kind of thing, but I am constantlyamazed at what people use Lucene for, so I won't say it can't bedone. I don't know how efficient it would be for doing things likePageRank or other graph algorithms, but I would be interesting inhearing more about what you have in mind.

Lucene doesn't do much caching at the document level, but that isfairly easy to implement, but it does bring some terms, etc. intomemory, and you may have a look at the FieldCache.


-Grant

On Jan 15, 2008, at 7:17 AM, Cam Bazz wrote:

Hello;

I like to use lucene as a graph store. The graph representation is alist of

edges. Consider the code below:

       final int commitCount = 16 * 1024;
       final int numObj =  1024 * 1024;

       Analyzer analyzer = new KeywordAnalyzer();

FSDirectory directory = FSDirectory.getDirectory("c:\\LuceneAdd");IndexWriter writer = new IndexWriter(directory, analyzer,true);


       Document doc;
       long start = System.currentTimeMillis();

       Random r = new Random(System.currentTimeMillis());

       for(int i=0; i<numObj; i++) {
           doc = new Document();
           doc.add(new Field("srcKey", NumberTools.longToString(i),
Field.Store.YES, Field.Index.UN_TOKENIZED));
           doc.add(new Field("dstKey",
NumberTools.longToString(r.nextInt(numObj)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
           doc.add(new Field("linkKey",
NumberTools.longToString(r.nextInt(16)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
           doc.add(new Field("linkValue", NumberTools.longToString(
r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));

          writer.addDocument(doc);

           if(i%commitCount==0) {
                long now = System.currentTimeMillis();
                System.out.println(i + ":" + (now-start));
                start = now;
           }

       }

       writer.optimize();
       writer.close();
       directory.close();

Basically I am adding a large number of documents from srcKey = i todstKey

= random and two other string fields - linkKey and linkValue.

Compared to a normal database store, or an oodbms such as perst ordb4o -

lucene takes longer to index.
However, it is much faster in searching, finding, retrieving records.

I can make 16384 random lookups over 1Million entries in 0.8seconds. This

is excellent time. (I have been benchmarking for a long time)

Typically, when number of objects in BTree based structure in anoodbms for

example increase, the search and add times also increase.

Will lucene have the same problem and how can I overcome it if itdoes.

Looking at the above code - does anyone has any recomendations
to improve index performance. (also what can I do to improve search
performance)

While searching with an indexsearcher - does lucene do any caching?usually

MRU caches are used to accomplish this.

Any ideas,help,recomendations greatly appreciated.

Best,
-C.B.


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene as a graph store

Reply via email to