Re: lucene as a graph store

Otis Gospodnetic Tue, 15 Jan 2008 08:22:43 -0800

Hi,
 

----- Original Message ----
From: Cam Bazz <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 8:50:07 AM
Subject: Re: lucene as a graph store


Usually for implementing things like page rank, or doing centrality
 metric
calculations or maybe dijkstras shortest term, this kind of (list of
 edges)
graph is not best at performance.
I like to use lucene for simple operations like neighboors of this
 node, or
2 degree neighboors of this node.

OG: Neighbours of a node?  Can you tell us more about what you use and how you 
use Lucene in this context?

is updating an index costly operation in lucene? I dont think there
 will too
much updates, but rather deletes.

when I delete documents from an index what things should I be careful
 about.

OG: Note that deletes are not removed from the index/disk immediately.  Rather, 
their doc IDs are stored and those docs are skipped during searches.  If you 
have a large number of deletes, you may want to optimize the index, so that 
this list of doc IDs to skip doesn't cost you.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



On Jan 15, 2008 3:31 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

> I guess the question comes down to what kind of things are you going
> to do w/ this graph?  How often are you updating links, etc?  I can't
> say Lucene was designed for this kind of thing, but I am constantly
> amazed at what people use Lucene for, so I won't say it can't be
> done.  I don't know how efficient it would be for doing things like
> PageRank or other graph algorithms, but I would be interesting in
> hearing more about what you have in mind.
>
> Lucene doesn't do much caching at the document level, but that is
> fairly easy to implement, but it does bring some terms, etc. into
> memory, and you may have a look at the FieldCache.
>
> -Grant
>
> On Jan 15, 2008, at 7:17 AM, Cam Bazz wrote:
>
> > Hello;
> >
> > I like to use lucene as a graph store. The graph representation is
 a
> > list of
> > edges. Consider the code below:
> >
> >        final int commitCount = 16 * 1024;
> >        final int numObj =  1024 * 1024;
> >
> >        Analyzer analyzer = new KeywordAnalyzer();
> >        FSDirectory directory = FSDirectory.getDirectory("c:\
> > \LuceneAdd");
> >        IndexWriter writer = new IndexWriter(directory, analyzer,
> > true);
> >
> >        Document doc;
> >        long start = System.currentTimeMillis();
> >
> >        Random r = new Random(System.currentTimeMillis());
> >
> >        for(int i=0; i<numObj; i++) {
> >            doc = new Document();
> >            doc.add(new Field("srcKey", NumberTools.longToString(i),
> > Field.Store.YES, Field.Index.UN_TOKENIZED));
> >            doc.add(new Field("dstKey",
> > NumberTools.longToString(r.nextInt(numObj)),
> > Field.Store.YES, Field.Index.UN_TOKENIZED));
> >            doc.add(new Field("linkKey",
> > NumberTools.longToString(r.nextInt(16)),
> > Field.Store.YES, Field.Index.UN_TOKENIZED));
> >            doc.add(new Field("linkValue", NumberTools.longToString(
> > r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));
> >
> >           writer.addDocument(doc);
> >
> >            if(i%commitCount==0) {
> >                 long now = System.currentTimeMillis();
> >                 System.out.println(i + ":" + (now-start));
> >                 start = now;
> >            }
> >
> >        }
> >
> >        writer.optimize();
> >        writer.close();
> >        directory.close();
> >
> >
> > Basically I am adding a large number of documents from srcKey = i
 to
> > dstKey
> > = random and two other string fields - linkKey and linkValue.
> >
> > Compared to a normal database store, or an oodbms such as perst or
> > db4o -
> > lucene takes longer to index.
> > However, it is much faster in searching, finding, retrieving
 records.
> >
> > I can make 16384 random lookups over 1Million entries in 0.8
> > seconds. This
> > is excellent time. (I have been benchmarking for a long time)
> >
> > Typically, when number of objects in BTree based structure in an
> > oodbms for
> > example increase, the search and add times also increase.
> >
> > Will lucene have the same problem and how can I overcome it if it
> > does.
> > Looking at the above code - does anyone has any recomendations
> > to improve index performance. (also what can I do to improve search
> > performance)
> >
> > While searching with an indexsearcher - does lucene do any caching?
> > usually
> > MRU caches are used to accomplish this.
> >
> > Any ideas,help,recomendations greatly appreciated.
> >
> > Best,
> > -C.B.
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
> http://www.lucenebootcamp.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene as a graph store

Reply via email to