Re: lucene as a graph store

Cam Bazz Tue, 15 Jan 2008 08:35:11 -0800

well lets say I have a list representation of a graph like

src:1 dst:2
src:2 dst:3
src:1 dst 3


outgoingEdgesOf(1) returns 2 and 3.
incomingEdgesOf(3) returns 1 and 2.

in a lucene index it does work out nice with term queries. I can search for
incoming outgoing or edgeExist with a boolean term query.

Best.


On Jan 15, 2008 6:22 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> Hi,
>
>
> ----- Original Message ----
> From: Cam Bazz <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, January 15, 2008 8:50:07 AM
> Subject: Re: lucene as a graph store
>
> Usually for implementing things like page rank, or doing centrality
>  metric
> calculations or maybe dijkstras shortest term, this kind of (list of
>  edges)
> graph is not best at performance.
> I like to use lucene for simple operations like neighboors of this
>  node, or
> 2 degree neighboors of this node.
>
> OG: Neighbours of a node?  Can you tell us more about what you use and how
> you use Lucene in this context?
>
> is updating an index costly operation in lucene? I dont think there
>  will too
> much updates, but rather deletes.
>
> when I delete documents from an index what things should I be careful
>  about.
>
> OG: Note that deletes are not removed from the index/disk immediately.
>  Rather, their doc IDs are stored and those docs are skipped during
> searches.  If you have a large number of deletes, you may want to optimize
> the index, so that this list of doc IDs to skip doesn't cost you.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> On Jan 15, 2008 3:31 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> > I guess the question comes down to what kind of things are you going
> > to do w/ this graph?  How often are you updating links, etc?  I can't
> > say Lucene was designed for this kind of thing, but I am constantly
> > amazed at what people use Lucene for, so I won't say it can't be
> > done.  I don't know how efficient it would be for doing things like
> > PageRank or other graph algorithms, but I would be interesting in
> > hearing more about what you have in mind.
> >
> > Lucene doesn't do much caching at the document level, but that is
> > fairly easy to implement, but it does bring some terms, etc. into
> > memory, and you may have a look at the FieldCache.
> >
> > -Grant
> >
> > On Jan 15, 2008, at 7:17 AM, Cam Bazz wrote:
> >
> > > Hello;
> > >
> > > I like to use lucene as a graph store. The graph representation is
>  a
> > > list of
> > > edges. Consider the code below:
> > >
> > >        final int commitCount = 16 * 1024;
> > >        final int numObj =  1024 * 1024;
> > >
> > >        Analyzer analyzer = new KeywordAnalyzer();
> > >        FSDirectory directory = FSDirectory.getDirectory("c:\
> > > \LuceneAdd");
> > >        IndexWriter writer = new IndexWriter(directory, analyzer,
> > > true);
> > >
> > >        Document doc;
> > >        long start = System.currentTimeMillis();
> > >
> > >        Random r = new Random(System.currentTimeMillis());
> > >
> > >        for(int i=0; i<numObj; i++) {
> > >            doc = new Document();
> > >            doc.add(new Field("srcKey", NumberTools.longToString(i),
> > > Field.Store.YES, Field.Index.UN_TOKENIZED));
> > >            doc.add(new Field("dstKey",
> > > NumberTools.longToString(r.nextInt(numObj)),
> > > Field.Store.YES, Field.Index.UN_TOKENIZED));
> > >            doc.add(new Field("linkKey",
> > > NumberTools.longToString(r.nextInt(16)),
> > > Field.Store.YES, Field.Index.UN_TOKENIZED));
> > >            doc.add(new Field("linkValue", NumberTools.longToString(
> > > r.nextInt(256)), Field.Store.YES, Field.Index.UN_TOKENIZED));
> > >
> > >           writer.addDocument(doc);
> > >
> > >            if(i%commitCount==0) {
> > >                 long now = System.currentTimeMillis();
> > >                 System.out.println(i + ":" + (now-start));
> > >                 start = now;
> > >            }
> > >
> > >        }
> > >
> > >        writer.optimize();
> > >        writer.close();
> > >        directory.close();
> > >
> > >
> > > Basically I am adding a large number of documents from srcKey = i
>  to
> > > dstKey
> > > = random and two other string fields - linkKey and linkValue.
> > >
> > > Compared to a normal database store, or an oodbms such as perst or
> > > db4o -
> > > lucene takes longer to index.
> > > However, it is much faster in searching, finding, retrieving
>  records.
> > >
> > > I can make 16384 random lookups over 1Million entries in 0.8
> > > seconds. This
> > > is excellent time. (I have been benchmarking for a long time)
> > >
> > > Typically, when number of objects in BTree based structure in an
> > > oodbms for
> > > example increase, the search and add times also increase.
> > >
> > > Will lucene have the same problem and how can I overcome it if it
> > > does.
> > > Looking at the above code - does anyone has any recomendations
> > > to improve index performance. (also what can I do to improve search
> > > performance)
> > >
> > > While searching with an indexsearcher - does lucene do any caching?
> > > usually
> > > MRU caches are used to accomplish this.
> > >
> > > Any ideas,help,recomendations greatly appreciated.
> > >
> > > Best,
> > > -C.B.
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.grantingersoll.com
> > http://www.lucenebootcamp.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: lucene as a graph store

Reply via email to