Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

Erick Erickson Fri, 05 Sep 2008 06:24:02 -0700

I've been tracking this list for a year or more, and this is the
first I've ever heard of such a thing. Which leads me to wonder
what *else* changed besides your index size. Classpath?
jar files? Some sysadmin modified your search box? Is the
program throwing an exception that you're masking somewhere
in the code? Is it possible that you're getting an Out Of
Memory exception?


Folks have routinely used MUCH larger indexes than 3G without
anything like this happening.

So, here's what I'd do.
1> verify your index. Look at it in, say Luke.
2> Log your queries. It's possible you're not
     querying what you think you are.
3> You should very easily be able to create a small,
     stupid program on your personal machine that will
     open this index and fire off the queries in question
     and see if your problem is environmental or programmatic.
     If you run it in an IDE, you should be able to break
     on exceptions if there are any.
4> Assuming that <2> exhibits the problem, start paring
     back your code. Take out one thing at a time until
     you don't see the problem.
5> really take a look at any code changes that are
     coincident with this anomaly. Are you totally sure
     that the only thing that's changed is the index?
6> why are you bothering to make everything final? Are
    your code snippets part of a class that's instantiated
    for each query? Note that this is more curiosity than
    thinking that it's the source of your problem <G>.


Best
Erick



On Thu, Sep 4, 2008 at 5:46 PM, Justin Grunau <[EMAIL PROTECTED]> wrote:

> Sorry, I forgot to include the visibility filters:
>
>                final BooleanQuery visibilityFilter = new BooleanQuery();
>                visibilityFilter.add(new TermQuery(new Term("isPublic",
> "true")),
>                        Occur.SHOULD);
>                visibilityFilter.add(new TermQuery(new Term("reader",
> user.getId())),
>                        Occur.SHOULD);
>
>
> These visibility filters ensure that a user only sees files which he or she
> has access to see.
>
> I am pretty certain nobody else has modified the index in the meantime, but
> why is that important?  We have several other servers -- whose only
> difference is a smaller data set -- with dozens of concurrent users, and the
> index on those servers gets modified and read concurrently all the time, but
> none of these other servers have ever exhibited this bug.
>
>
>
> ----- Original Message ----
> From: Leonid M. <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Thursday, September 4, 2008 5:35:47 PM
> Subject: Re: Problem with lucene search starting to return 0 hits when a
> few seconds earlier it was returning hundreds
>
> * And what's about visibility filter? * Are you sure no one else accesses
> IndexReader and modifies index? See reader.maxDocs() to be confident.
>
> On Fri, Sep 5, 2008 at 12:19 AM, Justin Grunau <[EMAIL PROTECTED]> wrote:
>
> > We have some code that uses lucene which has been working perfectly well
> > for several months.
> >
> > Recently, a QA team in our organization has set up a server with a much
> > larger data set than we have ever tested with in the past:  the resulting
> > lucene index is about 3G in size.
> >
> > On this particular server, the same lucene code which has been reliable
> in
> > the past is now exhibiting erratic behavior.  The first time you do a
> > search, it returns the correct number of hits.  The second time you do a
> > search, it may or may not return the correct set.  By the third time you
> do
> > a search, it will return 0 hits even for a search that was returning
> > hundreds of hits only a few seconds earlier.  All subsequent searches
> will
> > return 0 hits until you stop and restart the java process.
> >
> > A snippet of the relevant code follows:
> >
> >                    // getReader() returns the singleton IndexReader
> object
> >                final IndexReader reader = getReader();
> >
> >                    // ANALYZER is another singleton
> >                final QueryParser queryParser = new QueryParser("text",
> > ANALYZER);
> >                queryParser.setDefaultOperator(spec.getDefaultOp());
> >                final Query query =
> > queryParser.parse(spec.getSearchText()).rewrite(
> >                        reader);
> >                final IndexSearcher searcher = new IndexSearcher(reader);
> >
> >                final Hits hits = searcher.search(query, new
> > CachingWrapperFilter(
> >                        new QueryWrapperFilter(visibilityFilter)));
> >                total = hits.length();
> >
> >
> >
> > I understand that Lucene should be able to handle very large datasets, so
> > I'd be surprised if this were an actual Lucene bug.  I'm hoping it's just
> > that I'm doing something "wrong" which has gone unnoticed so far for
> several
> > months because we've never had an index this large.
> >
> > We're using lucene verison 2.2.0.
> >
> > Thanks!
> >
> > Justin Grunau
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
> --
> Bests regards,
> Leonid Maslov!
> Personal blog: http://leonardinius.blogspot.com/
>
> Random thought:
> Princess Margaret  - "I have as much privacy as a goldfish in a bowl."
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Problem with lucene search starting to return 0 hits when a few seconds earlier it was returning hundreds

Reply via email to