Re: filtering and chaining Collectors

2018-08-16 Thread Adrien Grand
I think one reason that we don't want to encourage filtering at the
collector level is that it is much slower than filtering in the query. The
former needs to check hits one by one while the latter can use leap frog to
skip documents that don't match.

Le mer. 15 août 2018 à 23:27, Michael Sokolov  a écrit :

> Hmm the more I root around, the more crazy it seems to try to thread a
> return value through all the different places collect() gets called from.
> Somehow I thought it would just be one place in IndexSearcher somewhere.
>
> On Wed, Aug 15, 2018 at 5:18 PM Michael Sokolov 
> wrote:
>
> > We have MultiCollector to enable running multiple Collectors on the same
> > hits, in sequence for each hit. I think a nice extension would be to
> enable
> > filtering so that earlier collectors could reject a hit, preventing later
> > collectors from seeing it.  This way you could have a post-filter
> > implemented in one collector, and some other collection, like faceting,
> in
> > the next one, that wants to ignore hits that are filtered in this
> > post-filter.
> >
> > The implementation idea would be to return a "status" value from
> > LeafCollector.collect() indicating how to proceed. This could also
> > naturally be used for early termination (you could have status=TERMINATE
> |
> > SKIP | COLLECT, say).
> >
> > I was trying to undertsand why this wasn't done before  for early
> > termination since it seemed so natural to me, and thought - there must
> be a
> > reason. But I went and read through (skimmed really) the original
> > EarlyTerminatingCollector issue (
> > https://issues.apache.org/jira/browse/LUCENE-4858) and didn't see any
> > discussion of that.
> >
> > Am I missing something here?
> >
> > -Mike
> >
>


Re: Legacy filter strategy in Lucene 6.0

2018-08-16 Thread Adrien Grand
Hi Alex,

IndexOrDocValuesQuery builds on the same blocks but I don't think you need
it here. Uwe's idea it to put both your selective term queries and
unselective doc-value queries in the same BooleanQuery. Lucene will know
that it needs to run the selective clauses first thanks to the cost API.

Le ven. 10 août 2018 à 05:13, alex stark  a écrit :

> Thanks Uwe, I think you are recommending
> IndexOrDocValuesQuery/DocValuesRangeQuery, and the articles by Adrien,
> https://www.elastic.co/blog/better-query-planning-for-range-queries-in-elasticsearch
> It looks promising for my requirement, I will try on that.  On Thu, 09
> Aug 2018 16:04:27 +0800 Uwe Schindler  wrote  Hi,
> IMHO: I'd split the whole code into a BooleanQuery with two filter clauses.
> The reverse index based condition (term condition, e.g., TermInSetQuery)
> gets added as a Occur.FILTER and the DocValues condition is a separate
> Occur.FILTER. If Lucene executes such a query, it would use the more
> specific condition (based on cost) to lead the execution, which should be
> the terms condition. The docvalues condition is then only checked for
> matches of the first. But you can still go and implement the two-phase
> iterator, but I'd not do that. Uwe - Uwe Schindler Achterdiek 19,
> D-28357 Bremen
> 
> http://www.thetaphi.de eMail: u...@thetaphi.de > -Original
> Message- > From: alex stark  > Sent: Thursday,
> August 9, 2018 9:12 AM > To: java-user  >
> Cc: java-user@lucene.apache.org > Subject: Re: Legacy filter strategy in
> Lucene 6.0 > > Thanks Adrien, I want to filter out docs base on conditions
> which stored in > doc values (those conditions are unselective ranges which
> is not appropriate > to put into reverse index), so I plan to use some
> selective term conditions to > do first round search and then filter in
> second phase. I see there is two > phase iterator, but I did not find how
> to use it. Is it a appropriate scenario to > use two phase iterator? or It
> is better to do it in a collector? Is there any > guide of two phase
> iterator? Best Regards  On Wed, 08 Aug 2018 > 16:08:39 +0800 Adrien
> Grand  wrote  Hi Alex, These > strategies still
> exist internally, but BooleanQuery decides which one to use > automatically
> based on the cost API (cheaper clauses run first) and whether > sub clauses
> produce bitset-based or postings-based iterators. Le mer. 8 août > 2018 à
> 09:46, alex stark  a écrit : > As FilteredQuery >
> are removed in Lucene 6.0, we should use boolean query to > do the >
> filtering. How about the legacy filter strategy such as > >
> LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? > What is
> the > current filter strategy? Thanks,
> - To
> unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For
> additional commands, e-mail: java-user-h...@lucene.apache.org


Re: filtering and chaining Collectors

2018-08-16 Thread Michael Sokolov
Right, that makes sense usually. But there are use cases for
post-filtering. A good example is when a collector performs grouping or
windowing and we want to apply filters based on the grouped or windowed
values.

On Thu, Aug 16, 2018 at 4:22 AM Adrien Grand  wrote:

> I think one reason that we don't want to encourage filtering at the
> collector level is that it is much slower than filtering in the query. The
> former needs to check hits one by one while the latter can use leap frog to
> skip documents that don't match.
>
> Le mer. 15 août 2018 à 23:27, Michael Sokolov  a
> écrit :
>
> > Hmm the more I root around, the more crazy it seems to try to thread a
> > return value through all the different places collect() gets called from.
> > Somehow I thought it would just be one place in IndexSearcher somewhere.
> >
> > On Wed, Aug 15, 2018 at 5:18 PM Michael Sokolov 
> > wrote:
> >
> > > We have MultiCollector to enable running multiple Collectors on the
> same
> > > hits, in sequence for each hit. I think a nice extension would be to
> > enable
> > > filtering so that earlier collectors could reject a hit, preventing
> later
> > > collectors from seeing it.  This way you could have a post-filter
> > > implemented in one collector, and some other collection, like faceting,
> > in
> > > the next one, that wants to ignore hits that are filtered in this
> > > post-filter.
> > >
> > > The implementation idea would be to return a "status" value from
> > > LeafCollector.collect() indicating how to proceed. This could also
> > > naturally be used for early termination (you could have
> status=TERMINATE
> > |
> > > SKIP | COLLECT, say).
> > >
> > > I was trying to undertsand why this wasn't done before  for early
> > > termination since it seemed so natural to me, and thought - there must
> > be a
> > > reason. But I went and read through (skimmed really) the original
> > > EarlyTerminatingCollector issue (
> > > https://issues.apache.org/jira/browse/LUCENE-4858) and didn't see any
> > > discussion of that.
> > >
> > > Am I missing something here?
> > >
> > > -Mike
> > >
> >
>


RE: Improve Search Speed

2018-08-16 Thread thturk
Thank you for your advice  as i researched many people suggest me same things
like making better complex queries to get  more spesific results .
but i didnt excatly get what is more spesifc queries . More indexed fields
and  put many different kind of boolean queries in it mostly i am  using
fuzzy queries but its also not returning me good results is the keyword is
longer or shorter than index its not maching.




--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org