Re: Antw.: Search returning documents matching a NOT range

Ian Lea Mon, 08 Nov 2010 04:23:45 -0800

It occurs in David's index and in my much simplifed test/demo index.
There is nothing special in mine so I'd guess the problem isn't really
index or data related, but certainly can't vouch for that.



--
Ian.


On Mon, Nov 8, 2010 at 12:05 PM, Uwe Schindler <u...@thetaphi.de> wrote:
> That's extremely strange. If this is a bug in Multisearcher, we should fix
> in proposed 3.0.3 release. Does the problem only occur with this special
> index?
>
> ---
> Uwe Schindler
> Generics Policeman
> Bremen, Germany
>
> ----- Reply message -----
> Von: "Ian Lea" <ian....@gmail.com>
> Datum: Mo., Nov. 8, 2010 12:45
> Betreff: Search returning documents matching a NOT range
> An: <java-user@lucene.apache.org>
> Cc: "David Fertig" <dfer...@cymfony.com>
>
>
> This does seem extremely odd.  David sent me a copy of his index and
> I've played around with it and also written a self-contained RAM index
> program, below, that shows the same problem, namely that if the second
> index has 1000+ docs the one and only doc in the first index is
> incorrectly matched if the search is done with a MultiSearcher.  In
> answer to Uwe's question, it works correctly if use a single
> IndexSearcher on top of a MultiReader.
>
> Tests run with lucene-core-3.0.2.jar.
>
> Snippet from program output:
>
> Larger index with 999 docs
> --- multi reader ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1000
> Hit count: 0
> --- multi searcher ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1000
> Hit count: 0
>
> Larger index with 1000 docs
> --- multi reader ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1001
> Hit count: 0
> --- multi searcher ---
> Query: +author:aaa -pubdate:[aaa TO bbb]
> MaxDocs: 1001
> Hit count: 1
> Docno: 0
> author: /aaa/, indexed: true
> pubdate: /abc/, indexed: true
>
> -----------------------------------------------------------------------
> package test;
>
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.analysis.standard.*;
> import org.apache.lucene.document.*;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.index.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.store.*;
> import org.apache.lucene.util.Version;
>
> public class LuceneTest8 {
>
>    static public void main(String[] args) throws Exception {
> test(999);
> test(1000);
> test(1001);
>    }
>
>
>    static void test(int _max) throws Exception {
> System.out.printf("\n\nLarger index with %s docs\n", _max);
> Analyzer anl = new StandardAnalyzer(Version.LUCENE_30);
> Directory dir1 = loadIndex(anl, 1, "aaa", "abc");
> Directory dir2 = loadIndex(anl, _max, "zzz", "zzz");
> QueryParser qp = new QueryParser(Version.LUCENE_30, "author", anl);
> String qstr = "author:aaa AND NOT pubdate:[aaa TO bbb]";
> Query q = qp.parse(qstr);
> IndexReader ir1 = IndexReader.open(dir1);
> IndexReader ir2 = IndexReader.open(dir2);
> Searcher searcher1 = new IndexSearcher(ir1);
> Searcher searcher2 = new IndexSearcher(ir2);
> MultiReader mr = new MultiReader(ir1, ir2);
> Searcher searcherm1 = new IndexSearcher(mr);
> MultiSearcher searcherm2 = new MultiSearcher(searcher1, searcher2);
> search(q, searcher1, "small index");
> search(q, searcher2, "larger index");
> search(q, searcherm1, "multi reader");
> search(q, searcherm2, "multi searcher");
>    }
>
>
>
>    static Directory loadIndex(Analyzer _anl,
>       int _max,
>       String _author,
>       String _pd) throws Exception {
> RAMDirectory dir = new RAMDirectory();
> IndexWriter iw = new IndexWriter(dir,
> _anl,
> true,
> IndexWriter.MaxFieldLength.UNLIMITED);
> for (int i = 0; i < _max; i++) {
>    Document d = new Document();
>    d.add(new Field("author", _author,
>    Field.Store.YES, Field.Index.ANALYZED));
>    d.add(new Field("pubdate", _pd,
>    Field.Store.YES, Field.Index.ANALYZED));
>    iw.addDocument(d);
> }
> iw.close();
> return dir;
>    }
>
>
>    static void search(Query _q,
>       Searcher _searcher,
>       String _what) throws Exception {
> System.out.printf("--- %s ---\n", _what);
> System.out.printf("Query: %s\n", _q.toString());
> System.out.printf("MaxDocs: %s\n", _searcher.maxDoc());
> TopDocs topDocs = _searcher.search(_q, 10);
> System.out.printf("Hit count: %s\n", topDocs.totalHits);
> for (int in = 0; in < topDocs.totalHits; in++) {
>    int docno = topDocs.scoreDocs[in].doc;
>    Document ldoc = _searcher.doc(docno);
>    System.out.printf("Docno: %s\n", docno);
>    for (Fieldable f : ldoc.getFields()) {
> System.out.printf("%s: /%s/, indexed: %s\n",
>  f.name(), f.stringValue(), f.isIndexed());
>    }
> }
>    }
> }
>
>
> --
> Ian.
>
>
> On Mon, Nov 8, 2010 at 4:32 AM, Uwe Schindler <u...@thetaphi.de> wrote:
>> Does the same happen with a MultiReader on top of both indexes and using a
>> single IndexSearcher on top of this MultiReader?
>>
>> P.S.: How about using NumericField?
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: David Fertig [mailto:dfer...@cymfony.com]
>>> Sent: Monday, November 08, 2010 4:21 AM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Search returning documents matching a NOT range
>>>
>>> publish_date is a string, formatted as YYYYMMDD, so it string sorting
>> should
>>> work correctly for this field.
>>>
>>> The field is indexed as a keyword and the field's value is also stored.
>>>
>>> I have previously reviewed the terms and optimized the index with luke
>>> 1.0.1 to make sure there was no index corruption. It is a very useful
>> tool,
>>> however it can only open 1 index at a time so I can't reproduce the issue
>> with
>>> it.
>>>
>>> At your suggestion I added code to enumerate all terms in the indexes and
>>> there are no inconsistencies.
>>>
>>> Th
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Antw.: Search returning documents matching a NOT range

Reply via email to