RE: Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Uwe Schindler
The problem is how you use the Document/Field/Analyzer in your test code (see my mail that explains it). The second problem is that you use new Field(...,TokenStream), which instantiates the TokenStream at the time of calling, so it is "in use", which violates the general call-order of Analyzers

RE: Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Konstantyn Smirnov
Thanks for the answer Uwe! so the behavior has changed since the 3.6, hasn't it? Now I need to instantiate the analyzer each time I feed the field with the tokenStream, or it happens behind the scenes if I use new (String name, String value, Field.Store store). Another question then... Now I tr

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread saisantoshi
Here is how I am using it: public class MyCollector extends PositiveScoresOnlyCollector { private IndexReader indexReader; public MyCollector(IndexReader indexReader, PositiveScoresOnlyCollector topScore) { super(topScore); this.indexReader = indexReader;

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread Uwe Schindler
You have to implement setNextReader in your collector. In setNextReader() save the AtomicReader from context.reader() in a field and use it from the collect method. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message---

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread saisantoshi
Thanks. Is there any issue the way we are calling the indexReader.getDocument(doc)? Not sure how do I get an AtomicReaderConext in the following below method? Any pointers on how do I get that instance is appreciated? public void collect(int doc) throws IOException { // ADD YOUR CUSTOM LOGIC

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread Uwe Schindler
You have to use the IndexReader that you get via Collector.setNextReader(AtomicReaderContext ctx). The context will provide you with the correct atomic reader and the correct document base for collecting documents with collect (all ids are relative to the context). - Uwe Schindler H.-H.-Mei

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-02-27 Thread saisantoshi
I want to get the Document in the following below code and thats why I need an indexReader public void collect(int doc) throws IOException { // ADD YOUR CUSTOM LOGIC HERE *Document doc = indexReader.document(doc)* delegate.collect(doc); } But this seems to be the problem as the in

RE: Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Uwe Schindler
In addition, in your first field you are using StringReader to feed in the data which can only be consumed once. This has nothing to do with TokenStream reuse. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- >

RE: Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Uwe Schindler
The problem here is that the tokenstream is instantiated in the same thread from 2 different code paths and consumed later. If you add fields, the indexer will fetch a new reused TokenStream one after each other and consume them directly after getting. It will not interleave this. In your case,

Confusion with Analyzer.tokenStream() re-use in 4.1

2013-02-27 Thread Konstantyn Smirnov
Dear all, I'm using the following test-code: Document doc = new Document() Analyzer a = new SimpleAnalyzer( Version.LUCENE_41 ) TokenStream inputTS = a.tokenStream( 'name1', new StringReader( 'aaa bbb ccc' ) ) Field f = new TextField( 'name1', inputTS ) doc.add f TokenStream ts = doc.getField(

Sorting with FieldCache throughout all segments

2013-02-27 Thread Igor Shalyminov
Hi all! I need to get top document IDs for a query by a custom sort method (e.g. "created" field, or some multi-field condition), I use FieldCache for that purpose, as community members suggested me. With the latest Lucene Atomic* trends, I make per-segment top document lists. It is fine within

RE: Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-27 Thread Uwe Schindler
Hi Paul, QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQuery to another query type (generally a query that allows to add "Term" instances, e.g. BooleanQuery of

Re: Uable to extends TopTermsRewrite in Lucene 4.1

2013-02-27 Thread Paul Taylor
On 26/02/2013 18:01, Paul Taylor wrote: On 26/02/2013 17:22, Uwe Schindler wrote: Hi, You cannot override rewrite() because you could easily break the logic behind TopTermsRewrite. If you want another behavior, subclass another base class and wrap the TopTermsRewrite instead of subclassing it (