RE: Search returning documents matching a NOT range

Uwe Schindler Sun, 07 Nov 2010 20:33:07 -0800

Does the same happen with a MultiReader on top of both indexes and using a
single IndexSearcher on top of this MultiReader?


P.S.: How about using NumericField?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: David Fertig [mailto:dfer...@cymfony.com]
> Sent: Monday, November 08, 2010 4:21 AM
> To: java-user@lucene.apache.org
> Subject: RE: Search returning documents matching a NOT range
> 
> publish_date is a string, formatted as YYYYMMDD, so it string sorting
should
> work correctly for this field.
> 
> The field is indexed as a keyword and the field's value is also stored.
> 
> I have previously reviewed the terms and optimized the index with luke
> 1.0.1 to make sure there was no index corruption. It is a very useful
tool,
> however it can only open 1 index at a time so I can't reproduce the issue
with
> it.
> 
> At your suggestion I added code to enumerate all terms in the indexes and
> there are no inconsistencies.
> 
> The two fields being searched each only have 1 term in the first index (as
> expected) and those terms are not in the second index.
> 
> David
> 
> 
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, November 7, 2010 11:12 AM
> To: java-user@lucene.apache.org
> Subject: Re: Search returning documents matching a NOT range
> 
> What kind of field is publish_date? And how do you store data there?
> Is it possible you're getting some date presentation wonkiness in here?
> One thing that might shed light on your problem is if you enumerated the
> terms in that field and printed them out rather than the document.get.
That is,
> be sure you're getting what's in the index (and thus being searched)
rather than
> wha's stored in the document.
> 
> Luke might get you there faster/easier....
> 
> Best
> Erick
> 
> On Fri, Nov 5, 2010 at 5:18 PM, David Fertig <dfer...@cymfony.com>
> wrote:
> 
> > Ian,
> > Thank you for getting back to me.  No, I do not get a bogus hit from
> > searching the small index alone.  Also, I do not get a hit if I delete
> any
> > more documents from the larger index.
> >
> > I have updated my test to use RamDirectory and also print maxDoc() for
> the
> > searchables and the searcher, all numbers are as expected.  I have
> posted
> > all the code, but did not want to post the indexes due to their size
> (2.2
> > meg uncompressed).  I will mail them to anyone who can help.
> >
> > Here is the complete latest test code and its output
> >
> >
> >
> > public class LuceneTest {
> >    static public void main(String[] args) {
> >        try {
> >            QueryParser queryParser = new
> QueryParser(Version.LUCENE_30,
> > "author", new KeywordAnalyzer());
> >            Query query = queryParser.parse("author:bentalcella AND NOT
> > publish_date:[20100601 TO 20100630]");
> >            Searchable[] searchables = new Searchable[2];
> >             RAMDirectory ram1 = new RAMDirectory(new
> NIOFSDirectory(new
> > File("/home/dfertig/testIndexes/b1")));
> >            RAMDirectory ram2 = new RAMDirectory(new NIOFSDirectory(new
> > File("/home/dfertig/testIndexes/m1")));
> >            searchables[0] = new IndexSearcher(ram1, true);
> >            searchables[1] = new IndexSearcher(ram2, true);
> >            MultiSearcher searcher = new MultiSearcher(searchables);
> >            System.out.println("MaxDocs for index 1: " +
> > searchables[0].maxDoc());
> >            System.out.println("MaxDocs for index 2: " +
> > searchables[1].maxDoc());
> >            System.out.println("MaxDocs for MultiSearcher: " +
> > searcher.maxDoc());
> >             System.out.println("Query: " + query.toString());
> >            TopDocs topDocs = searcher.search(query, 10);
> >            System.out.println("Results: " + topDocs.totalHits);
> >            for (int in = 0; in < topDocs.totalHits; in++) {
> >                Document document =
> searcher.doc(topDocs.scoreDocs[in].doc);
> >                System.out.println("publish_date: " +
> > document.get("publish_date"));
> >            }
> >            searcher.close();
> >             ram1.close();
> >            ram2.close();
> >         } catch (Exception e) {
> >            System.out.println(e.getMessage());
> >            e.printStackTrace();
> >        }
> >    }
> > }
> >
> > Output:
> > MaxDocs for index 1: 1
> > MaxDocs for index 2: 1000
> > MaxDocs for MultiSearcher: 1001
> > Query: +author:bentalcella -publish_date:[20100601 TO 20100630]
> > Results: 1
> > publish_date: 20100606
> >
> >
> >
> >
> > -----Original Message-----
> > From: Ian Lea [mailto:ian....@gmail.com]
> > Sent: Friday, November 5, 2010 4:57 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Search returning documents matching a NOT range
> >
> > Do you get the bogus hit on the small index if search that index
> > alone?  Are you positive it only holds the one doc? Loading the one
> > doc into a new RAM based index in the test would prove it.
> >
> > You are more likely to get help if post a self-contained example -
> > people can see everything relevant and are more likely to spot a
> > problem.
> >
> >
> > --
> > Ian.
> >
> >
> > On Thu, Nov 4, 2010 at 4:52 PM, David Fertig <dfer...@cymfony.com>
> wrote:
> > > I have an active lucene implementation that has been in place for a
> > > couple years and was recently upgraded to the 3.02 branch. We are
> now
> > > occasionally seeing documents returned from searches that should not
> be
> > > returned. I have reduced the code and indexes to the smallest set
> > > possible where I can still repeat the issue.
> > >
> > >
> > >
> > > My test cases uses 2 indexes.  These indexes have been
> rebuilt/optimized
> > > using Luke 1.0.1 to make them the smallest possible.  One index has
> 1
> > > document, which is being returned by the query but should not.   The
> > > other index has 1000 documents, none of which match the search
> criteria.
> > > The query should bring back 0 results, but brings back 1.  I can zip
> and
> > > mail the indexes if it would aid in helping track down this issue.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > public class LuceneTest {
> > >
> > >    static public void main(String[] args) {
> > >
> > >        try {
> > >
> > >            QueryParser queryParser = new
> QueryParser(Version.LUCENE_30,
> > > "author", new KeywordAnalyzer());
> > >
> > >            Query query = queryParser.parse("author:bentalcella AND
> NOT
> > > publish_date:[20100601 TO 20100630]");
> > >
> > >            Searchable[] searchables = new Searchable[2];
> > >
> > >            searchables[0] = new IndexSearcher(new NIOFSDirectory(new
> > > File("/home/dfertig/testIndexes/b1")), true);
> > >
> > >            searchables[1] = new IndexSearcher(new NIOFSDirectory(new
> > > File("/home/dfertig/testIndexes/m1")), true);
> > >
> > >            Searcher searcher = new MultiSearcher(searchables);
> > >
> > >            System.out.println("Query: " + query.toString());
> > >
> > >            TopDocs topDocs = searcher.search(query, 10);
> > >
> > >            System.out.println("Results: " + topDocs.totalHits);
> > >
> > >            for (int in = 0; in < topDocs.totalHits; in++) {
> > >
> > >                Document document =
> > > searcher.doc(topDocs.scoreDocs[in].doc);
> > >
> > >                System.out.println("publish_date: " +
> > > document.get("publish_date"));
> > >
> > >            }
> > >
> > >            searcher.close();
> > >
> > >        } catch (Exception e) {
> > >
> > >            System.out.println(e.getMessage());
> > >
> > >            e.printStackTrace();
> > >
> > >        }
> > >
> > >    }
> > >
> > > }
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Search returning documents matching a NOT range

Reply via email to