Re: Faceted Search using Lucene

Michael McCandless Sun, 01 Mar 2009 06:24:41 -0800

OK new version of SearcherManager, that fixes maybeReopen() so that itcan be called from multiple threads.


NOTE: it's still untested!

Mike

package lia.admin;

import java.io.IOException;
import java.util.HashMap;

import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.store.Directory;

/** Utility class to get/refresh searchers when you are
 *  using multiple threads. */

public class SearcherManager {

  private IndexSearcher currentSearcher;                         //A
  private Directory dir;

  public SearcherManager(Directory dir) throws IOException {
    this.dir = dir;
    currentSearcher = new IndexSearcher(IndexReader.open(dir));  //B
  }

  public void warm(IndexSearcher searcher) {}                    //C

  private boolean reopening;

  private synchronized void startReopen()                        //D
    throws InterruptedException {
    while (reopening) {
      wait();
    }
    reopening = true;
  }

  private synchronized void doneReopen() {                       //E
    reopening = false;
    notifyAll();
  }

public void maybeReopen() throws InterruptedException, IOException{ //F


    startReopen();

    try {
      final IndexSearcher searcher = get();
      try {

long currentVersion =currentSearcher.getIndexReader().getVersion(); //Gif (IndexReader.getCurrentVersion(dir) != currentVersion){ //GIndexReader newReader =currentSearcher.getIndexReader().reopen(); //Gassert newReader !=currentSearcher.getIndexReader(); //GIndexSearcher newSearcher = newIndexSearcher(newReader); //Gwarm(newSearcher); //GswapSearcher(newSearcher); //G

        }
      } finally {
        release(searcher);
      }
    } finally {
      doneReopen();
    }
  }

  public synchronized IndexSearcher get() {                      //H
    currentSearcher.getIndexReader().incRef();
    return currentSearcher;
  }

  public synchronized void release(IndexSearcher searcher)       //I
    throws IOException {
    searcher.getIndexReader().decRef();
  }

  private synchronized void swapSearcher(IndexSearcher newSearcher) //J
      throws IOException {
    release(currentSearcher);
    currentSearcher = newSearcher;
  }
}

/*
#A Current IndexSearcher
#B Create initial searcher
#C Implement in subclass to warm new searcher
#D Pauses until no other thread is reopening
#E Finish reopen and notify other threads
#F Reopen searcher if there are changes
#G Check index version and reopen, warm, swap if needed
#H Returns current searcher
#I Release searcher
#J Swaps currentSearcher to new searcher
*/

Mike

On Mar 1, 2009, at 8:27 AM, Amin Mohammed-Coleman wrote:

just a quick point:
public void maybeReopen() throws IOException {                 //D
  long currentVersion = currentSearcher.getIndexReader().getVersion();
  if (IndexReader.getCurrentVersion(dir) != currentVersion) {
    IndexReader newReader = currentSearcher.getIndexReader().reopen();
    assert newReader != currentSearcher.getIndexReader();
    IndexSearcher newSearcher = new IndexSearcher(newReader);
    warm(newSearcher);
    swapSearcher(newSearcher);
  }
}

should the above be synchronised?

On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <ami...@gmail.com>wrote:

thanks. i will rewrite..in between giving my baby her feed andplayingwith the other child and my wife who wants me to do several otherthings!




On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:


Amin Mohammed-Coleman wrote:

Hi

Thanks for your input. I would like to have a go at doing thismyself
first, Solr may be an option.

* You are creating a new Analyzer & QueryParser every time, also
creating unnecessary garbage; instead, they should be created once
& reused.
-- I can moved the code out so that it is only created once andreused.
* You always make a new IndexSearcher and a new MultiSearcher even
when nothing has changed.  This just generates unnecessary garbage
which GC then must sweep up.
-- This was something I thought about. I could move it out sothat it'screated once. However I presume inside my code i need to checkwhether
the
indexreaders are update to date. This needs to be synchronizedas well I
guess(?)


Yes you should synchronize the check for whether the IndexReader is
current.

* I don't see any synchronization -- it looks like two search

requests are allowed into this method at the same time?  Which is
dangerous... eg both (or, more) will wastefully reopen the
readers.
--  So i need to extract the logic for reopening and provide a
synchronisation mechanism.


Yes.

Ok. So I have some work to do. I'll refactor the code and see ifI can

get
inline to your recommendations.


On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

On a quick look, I think there are a few problems with the code:

* I don't see any synchronization -- it looks like two search
requests are allowed into this method at the same time?  Which is
dangerous... eg both (or, more) will wastefully reopen the
readers.

* You are over-incRef'ing (the reader.incRef inside the loop) -- I
don't see a corresponding decRef.

* You reopen and warm your searchers "live" (vs with BG thread);
meaning the unlucky search request that hits a reopen pays the
cost.  This might be OK if the index is small enough that
reopening & warming takes very little time.  But if index gets
large, making a random search pay that warming cost is not nice to
the end user.  It erodes their trust in you.

* You always make a new IndexSearcher and a new MultiSearcher even
when nothing has changed.  This just generates unnecessary garbage
which GC then must sweep up.

* You are creating a new Analyzer & QueryParser every time, also
creating unnecessary garbage; instead, they should be created once
& reused.

You should consider simply using Solr -- it handles all thislogic for

you and has been well debugged with time...

Mike

Amin Mohammed-Coleman wrote:

The reason for the indexreader.reopen is because I have a webappwhich

enables users to upload files and then search for thedocuments. If I
don't
reopen i'm concerned that the facet hit counter won't be updated.

On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
ami...@gmail.com
wrote:
Hi
I have been able to get the code working for my scenario,however I
have
a
question and I was wondering if I could get some help. I havea list
of
IndexSearchers which are used in a MultiSearcher class. I usethe
indexsearchers to get each indexreader and put them into a
MultiIndexReader.

IndexReader[] readers = new IndexReader[searchables.length];

for (int i =0 ; i < searchables.length;i++) {

IndexSearcher indexSearcher = (IndexSearcher)searchables[i];

readers[i] = indexSearcher.getIndexReader();

IndexReader newReader = readers[i].reopen();

if (newReader != readers[i]) {

readers[i].close();

}

readers[i] = newReader;



}

multiReader = new MultiReader(readers);

OpenBitSetFacetHitCounter facetHitCounter =
newOpenBitSetFacetHitCounter();

IndexSearcher indexSearcher = new IndexSearcher(multiReader);
I then use the indexseacher to do the facet stuff. I end thecode
with
closing the multireader. This is causing problems in anothermethod
where I
do some other search as the indexreaders are closed. Is it okto not
close
the multiindexreader or should I do some additional checks inthe
other
method to see if the indexreader is closed?



Cheers


P.S. Hope that made sense...!


On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
ami...@gmail.com
wrote:
Hi
Thanks just what I needed!

Cheers
Amin
On 22 Feb 2009, at 16:11, Marcelo Ochoa <marcelo.oc...@gmail.com>
wrote:

Hi Amin:

Please take a look a this blog post:
http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
Best regards, Marcelo.

On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
ami...@gmail.com>
wrote:

Hi
Sorry to re send this email but I was wondering if I couldget some
advice
on this.

Cheers

Amin
On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <ami...@gmail.com>
wrote:

Hi
I am looking at building a faceted search using Lucene. Iknow
that
Solr
comes with this built in, however I would like to try thisby
myself
(something to add to my CV!). I have been looking aroundand I
found
that
you can use the IndexReader and use TermVectors. Thislooks ok
but
I'm
not
sure how to filter the results so that a particular usercan only
see
a
subset of results. The next option I was looking at wassomething
like

Term term1 = new Term("brand", "ford");
Term term2 = new Term("brand", "vw");
Term[] termsArray = new Term[] { term1, term2 };un
int[] docFreqs = indexSearcher.docFreqs(termsArray);
The only problem here is that I have to provide the brandtype
each
time a
new brand is created. Again I'm not sure how I can filterthe
results
here.
It may be that I'm using the wrong api methods to do this.

I would be grateful if I could get some advice on this.


Cheers
Amin
P.S. I am basically trying to do something that displaysthe
following

Personal Contact (23) Business Contact (45) and so on..
--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Want to integrate Lucene and Oracle?



http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
Is Oracle 11g REST ready?

http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Faceted Search using Lucene

Reply via email to