Hi ,
Please tell me how can I implement HitCollector in lucene 4.7.0, migrating
from previous versions . I didn't find HitCollector in 4.7.
Thanks & Regards,
Narasimha.
Hi,
That'a the way to go! With this method, you don't need to use FilteredQuery, it
is done under the hood automatically.
Please take care that Collector works, in contrast to HitCollector, per index
segment (setScorer/setNextReader), so you have to rewrite at least most of you
colle
and a TopDocsCollector instance.
Thanks,
Sai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/IndexSearcher-search-Weight-weight-Filter-filter-HitCollector-results-is-not-there-in-4-0-version-tp4035488p4035490.html
Sent from the Lucene - Java Users mailing list archive at Nabbl
Query using FilteredQuery
only passing a Weight to IndexSearcher. But those methods are protected; you
can only use them from subclasses (under the assumption that you know what you
are doing *g*).
Finally, the class HitCollector was deprecated in Lucene 2.9 and was removed in
3.0, you have to us
We are using the following below method with Lucene 2.4.0
public void search(Weight weight,
Filter filter,
HitCollector results)
throws IOException
We are upgrading to the latest version and looking at the API (4.0), the
above signature has been
way. Lucene has such
query, BooleanQuery which uses BooleanScorer or BooleanScorer2. BQ performs
better if allowed to pass documents out-of-order - it uses BooleanScorer for
that.
Hope that helps,
Shai
On Wed, Dec 9, 2009 at 8:58 PM, Max Lynch wrote:
> Hi,
> I have a HitCollector that pro
Hi,
I have a HitCollector that processes all hits from a query. I want all
hits, not the top N hits. I am converting my HitCollector to a Collector
for Lucene 3.0.0, and I'm a little confused by the new interface.
I assume that I can implement by new Collector much like the code on the API
Hi Simon,
> that is what my first guess was and I'm pretty sure that the long time
> is taken before the documents get scored. A short prefix can easily
> expand to thousands of terms, do you encounter
> TooManyClausesExceptions and in turn do you set
> BooleanQuery#setMaxClauseCount() to a higher
e user is short
>> because it yields a large number of hits. I was hoping that taking the
>> approach I mentioned, search engine would call the HitCollector
>> incrementally and give me a chance to end the search earlier but it seems
>> like it is not happening. Do you t
ered by the user is short
> because it yields a large number of hits. I was hoping that taking the
> approach I mentioned, search engine would call the HitCollector
> incrementally and give me a chance to end the search earlier but it seems
> like it is not happening. Do you think the
ach I mentioned, search engine would call the HitCollector
incrementally and give me a chance to end the search earlier but it seems
like it is not happening. Do you think the problem is with term expansion?
Regards,
Len
- original message -
From: simon.willnauer [at] googlemail
Re:
Hi Len,
what kind of query do you execute when you collect the hits.
HitCollector should be called for each document by the time it is
scored. Is it possible that you run a query that could be expensive in
terms of term expansion like WildcardQuery?
simon
On Sat, Aug 22, 2009 at 7:09 AM, Len
Len Takeuchi-2 wrote:
>
> Im using Lucene 2.4.1 and Im trying to use a custom HitCollector to
> collect
> only the first N hits (not the best hits) for performance. I saw another
> e-mail in this group where they mentioned writing a HitCollector which
> throws
> an exce
> I’m using Lucene 2.4.1 and I’m trying to use a custom
> HitCollector to collect only the first N hits (not the best hits) for
> performance.
You mean that you do not need score calculation therefore you do not want
results sorted by relevancy. Just you need is a Boolean Retrieval Mod
Hello,
Im using Lucene 2.4.1 and Im trying to use a custom HitCollector to collect
only the first N hits (not the best hits) for performance. I saw another
e-mail in this group where they mentioned writing a HitCollector which throws
an exception after N hits to do this. So I tried this
Mark Miller wrote:
Michael McCandless wrote:
Mark Miller wrote:
So HitCollector#collect(int doc, float score) is not called in a
special
(default) order and must order the docs itself by score if one
needs the
hits sorted by relevance?
Presumably there is no score ordering to the
Michael McCandless wrote:
Mark Miller wrote:
So HitCollector#collect(int doc, float score) is not called in a
special
(default) order and must order the docs itself by score if one needs
the
hits sorted by relevance?
Presumably there is no score ordering to the hit id's l
Mark Miller wrote:
So HitCollector#collect(int doc, float score) is not called in a
special
(default) order and must order the docs itself by score if one
needs the
hits sorted by relevance?
Presumably there is no score ordering to the hit id's lucene
delivers to
a HitCollector
So HitCollector#collect(int doc, float score) is not called in a special
(default) order and must order the docs itself by score if one needs the
hits sorted by relevance?
Presumably there is no score ordering to the hit id's lucene delivers to
a HitCollector? i.e. they are deliver
> The HitCollector used will determine how things are ordered.
> In 2.4, the
> TopDocCollector will order by relevancy and the
> TopFieldDocCollector can
> order by
> relevancy, index order, or by field. Lucene delivers the hit
> ids to the
> HitCollector and it can o
Presumably there is no score ordering to the hit id's lucene delivers to
a HitCollector? i.e. they are delivered in the order they are found and
score is neither ascending or descending i.e. the next score could be
higher or lower that the previous one?
-Original Message-
From:
spr...@gmx.eu wrote:
Hi,
in what order does search(Query query, HitCollector results) return the
results? By relevance?
Thank you.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e
Hi,
in what order does search(Query query, HitCollector results) return the
results? By relevance?
Thank you.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h
Hi All,
I vaguely remember discussions on lucene remote-ability of HitCollectors
based search(). As far as I remember, it is not possible if I use
HitCollectors.
In lucene 3, we are doing away with a lot of search() variants, including
the ones that return Hits.
I would like to know which one o
hossman wrote:
>
> Take a look at TopFieldDocCollector It's a HitCollector provided out of
> the box that does sorting.
>
will it work against a ParallelMultiSearcher?
--
View this message in context:
http://www.nabble.com/HitCollector-and-sorting-tp17604363p17881706.h
: So, how can I get the same results using the HitCollector? Also it would be
: really nice, if you could point me to some examples of using it...
Take a look at TopFieldDocCollector It's a HitCollector provided out of
the box that does sorting.
If you look at the trunk, the (rec
Hi all
Currently I'm using the search method returning the Hits object. According
to http://wiki.apache.org/lucene-java/ImproveSearchingSpeed one should use a
HitCollector-oriented search method instead.
But I need another aspect of the "Hits search(...)" method: it's sort
g for informations about the hitcollector. I was
wondering if
the value of the fields have to be stored or not. i tested it and it
worked
both but i'm still not really sure about it.
Second question is, can i work with tokenized fields?
Best regards
Jens
--
View this message in cont
Hi everybody,
I was searching for informations about the hitcollector. I was wondering if
the value of the fields have to be stored or not. i tested it and it worked
both but i'm still not really sure about it.
Second question is, can i work with tokenized fields?
Best regards
Jens
--
Hello,
I recently changed my query logic. Before, I was getting a hits object, and
now I am using a bitSet with a hitcollector.
The reason for using bitSet is document caching, and being able to count how
many hits belong to which categories.
Although my new logic works, I have noticed that now
even possible? since hitcollector
returns a bitset - how do we do the ordering?
Best,
-C.B.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hello,
How can I use a hit collector and sort object in query? I looked at the API
and sort is only usable with hits. Is it even possible? since hitcollector
returns a bitset - how do we do the ordering?
Best,
-C.B.
The bitset thing is just an example of a trivial operation in a
HitCollector. You'll want to do something like use TermDocs/TermEnum
to see what category your document is in and add it to some counts
you use rather than just add something to a bitset. Or see the idea
at the end of this mail.
Hello,
Could someone show me a concrete example of how to use HitCollector?
I have documents which have a field category. When I run a query, I need to
sort results by category as well as count how many hits are there for a
given category.
I understand:
searcher.search(Query, new HitCollector
Thanks to the "hitcollector" suggestion. It worked very well!we
have such a small index, but the categories are complicated, sequenced
and can be related in various ways that their structure is very
important to the users.so now I can get a list of "used categories"
: just need to access a few fields, and those are all fields that are in fact
: stored (and indexed too). I was thinking of keeping this extra information
: in memory, precisely into an array mapping doc ids to the data structure. I
if the fields you need are indexed and single valued (and untoken
earch - Share
- Original Message
From: Carlos Pita <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, May 24, 2007 12:50:04 PM
Subject: Re: HitCollector or Hits
Hi Erick,
I don't think that FieldSelector would be that valuable in my case because I
just need to acc
o the data structure. I
see that this is done for ScoreDocComparator in a Lucene in Action example.
I'm still not sure how to achieve something similar with a HitCollector. I
mean, I could instantiate a maxDoc() size array and index it by the document
ids that are passed to the collector. But th
mponent that shows only the stores that sells a product between the
> > first
> > 1000 hits. So even if the user sees just the first 20, I would have to
> > inspect the first 1000. I've read that Hits mantains a cache of about
> 100
> > or
> > 200 hits. Is this c
s configurable? If I could set this cache to 1000 I
would
> then use Hits to browse the search results. Another way, I should use
> HitCollector. What's your advice?
>
> TIA
> Cheers,
> Carlos
>
che to 1000 I would
then use Hits to browse the search results. Another way, I should use
HitCollector. What's your advice?
TIA
Cheers,
Carlos
000 I would
then use Hits to browse the search results. Another way, I should use
HitCollector. What's your advice?
TIA
Cheers,
Carlos
: Can anyone say why this is useful and what's wrong about raw scores?
in my opinion: it's not useful at all, and there is nothing wrong with
raw scores, but some people like having scores that are bounded in a
finite range, so Hits provides that.
-Hoss
-
once the scores were all one of 5 discrete values
My HitCollector is a variant of TopDocCollector and I have max score. I found
where Hits does the normalisation in Hits.getMoreDocs(). It simply multiplies
all scores by (1/maxScore).
I was looking too deep down around the Scorer...
Can
one of 5 discrete values
Erick
On 3/29/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:
Hits will normalise scores >0<=1, but I'm using HitCollector and haven't
worked
out how to normalise those scores.
From what I can see, the scores are just multiplied by a factor
Hits will normalise scores >0<=1, but I'm using HitCollector and haven't worked
out how to normalise those scores.
From what I can see, the scores are just multiplied by a factor to bring the
top score down to 1. Is this right or is there something more to it.
Do I need to
Hello
The collect(int doc, int score) method. in this method, which id the
argument doc refers to? the original id in the index or the id of search
result (the position of document in the search result)
I ask this, because I implement a HitCollector and collect the IDs in a
BitSet and it was
On Mar 15, 2007, at 12:27 AM, Antony Bowesman wrote:
Thanks for the detailed reponse Hoss. That's the sort of in depth
golden nugget I'd like to see in a copy of LIA 2 when it becomes
available...
NOTED! :)
Erik
-
: Performance between Filter and HitCollector?
eks dev and others - have you tried using the code from LUCENE-584? Noticed
any performance increase when you disabled scoring? I'd like to look at that
patch soon and commit it if everything is in place and makes sense, so I'm
curious if y
15 mar 2007 kl. 04.09 skrev Otis Gospodnetic:
eks dev and others - have you tried using the code from
LUCENE-584? Noticed any performance increase when you disabled
scoring? I'd like to look at that patch soon and commit it if
everything is in place and makes sense, so I'm curious if you
ut the
Query is generally more usable.
The Query can also be more efficient in other ways, because the
HitCollector doesn't *have* to build a BitSet, it can deal with the
results in whatever way it wants (where as a Filter allways generates a
BitSet).
Solr goes the HitCollector route for a few
formance between Filter and HitCollector?
just to complete this fine answer,
there is also Matcher patch (https://issues.apache.org/jira/browse/LUCENE-584)
that could bring the best of both worlds via e.g. ConstantScoringQuery or
another abstraction that enables disabling Scoring (w
: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 14 March, 2007 7:15:06 PM
Subject: Re: Performance between Filter and HitCollector?
it's kind of an Apples/Oranges comparison .. in the examples you gave
below, one is executing an arbitrary query (whi
y can also be more efficient in other ways, because the
HitCollector doesn't *have* to build a BitSet, it can deal with the
results in whatever way it wants (where as a Filter allways generates a
BitSet).
Solr goes the HitCollector route for a few reasons:
1) allows us to use hte DocSet ab
return bits;
and HitCollector.collect(), as suggested in Javadocs
final BitSet bits = new BitSet(indexReader.maxDoc());
searcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
bits.set(doc);
}
});
SOLR seems to use DocSetHitCollector in places whi
it, it is
> possible
>to explore the fields of remote indexes.
>
> See http://sourceforge.net/projects/lucollector/ for the source code (
> lu-collector-src-sampleop-0.8.zip).
>
> Regards,
> José L. Oramas
>
> On 2/26/07, oramas martín <[EM
ssible
to explore the fields of remote indexes.
See http://sourceforge.net/projects/lucollector/ for the source code (
lu-collector-src-sampleop-0.8.zip).
Regards,
José L. Oramas
On 2/26/07, oramas martín <[EMAIL PROTECTED]> wrote:
>
>
> Hello,
>
> As you probably know, th
explore the fields of remote indexes.
See http://sourceforge.net/projects/lucollector/ for the source code (
lu-collector-src-sampleop-0.8.zip).
Regards,
José L. Oramas
On 2/26/07, oramas martín <[EMAIL PROTECTED]> wrote:
Hello,
As you probably know, the HitCollector-based search API is not
However, see Wiki HowToContribute: http://wiki.apache.org/jakarta-
lucene/HowToContribute if you wish to donate your code.
-Grant
On Feb 25, 2007, at 6:56 PM, oramas martín wrote:
Hello,
As you probably know, the HitCollector-based search API is not
meant to work
remotely, because it will
Hello,
As you probably know, the HitCollector-based search API is not meant to work
remotely, because it will generate a RPC-callback for every non-zero score.
There is another problem with MultiSearcher-HitCollector-based search which
knows nothing about mix HitCollector based searches (not to
gt; That is, one query on the body and one on the attachment will give you
> two
> lists that you'll then have to manually reconcile if relevancy matters.
>
> Depending upon how many emails and attachments you get hits for, you
> could
> do something like
> 1> sear
n the attachment will give you
two
lists that you'll then have to manually reconcile if relevancy matters.
Depending upon how many emails and attachments you get hits for, you
could
do something like
1> search for the body elements with the to/from/cc. Use the return
(perhaps
with a HitC
y matters.
Depending upon how many emails and attachments you get hits for, you could
do something like
1> search for the body elements with the to/from/cc. Use the return (perhaps
with a HitCollector (definitely NOT a Hits object)) to assemble a clause
like ID=52343 or ID=985 or ID=8910 an
documents on return. That is ok.
I was wondering...would HitCollector be something i should use.
Basically have the searcher check documents to make sure they are ok to
go (i.e. to, from. etc is correct)?
Make sense?
Thanks!
Michael
Hey, Ryan, Thanks for your reply.
The scenario is I use a custom Filter which get some information from a
database table which consists of hundreds of thousands of rows. I use the
IndexSearcher.search(query, filter, hitcollector). I found it was consumed
more time with filter than that without no
Hey Andy,
If you have enough RAM, try using FieldCache:
String[] fieldYouWant = FieldCache.DEFAULT.getStrings
(searcher.getIndexReader(), "fieldYouWant");
searcher.search(query, new HitCollector(){
public void collect(int doc, float score){
doWhatYouWant(fi
I
> > should return the document. The total number of documents is about two
> > hundred thousand. So I'm afraid the
> > performance
> >
> >
> > 2006/8/7, Martin Braun < [EMAIL PROTECTED]>:
> > >
> > > hi an
)>on
> every document number encountered
>
> Because I have to check a field in the document to determine whether I
> should return the document. The total number of documents is about two
> hundred thousand. So I'm afraid the
> performance
>
>
> 2006/8/7, Marti
ncountered
>
> Because I have to check a field in the document to determine whether I
> should return the document. The total number of documents is about two
> hundred thousand. So I'm afraid the
> performance
>
>
> 2006/8/7, Martin Braun <[EMAIL PROTECTED]>:
&g
mance
2006/8/7, Martin Braun <[EMAIL PROTECTED]>:
>
> hi andy,
> > How can I use HitCollector to iterate over every returned document?
>
> You have to override the function collect for the HitCollector class and
> then store the retrieved Data in an array or map.
document to determine whether I
should return the document. The total number of documents is about two
hundred thousand. So I'm afraid the
performance
2006/8/7, Martin Braun <[EMAIL PROTECTED]>:
hi andy,
> How can I use HitCollector to iterate over every returned document
hi andy,
> How can I use HitCollector to iterate over every returned document?
You have to override the function collect for the HitCollector class and
then store the retrieved Data in an array or map.
Here is just a source-code scratch (is = IndexSearcher)
is.search(qu
How can I use HitCollector to iterate over every returned document?
Thank you in advance.
some stuff in a Map
which I can then retrieve from the HitCollector (much
like the example in the Lucene In Action book). Of
course that's somewhat expensive, so I'd like to do
some statistical sampling based on the result set size
to try and speed things up.
The way I was thinking a
Hey,
Sorry, I will explain a bit more about my collect
method. Currently my collect method is executing
IndexSearcher.doc(id) and storing some stuff in a Map
which I can then retrieve from the HitCollector (much
like the example in the Lucene In Action book). Of
course that's somewhat expe
: I'm using a HitCollector and would like to know the
: total number of results that matched a given query.
: Based on the JavaDoc, I this will do the trick:
you don't need a BitSet in that case, you could find that out just using
an int...
public CountingCollector extends Hi
Hey Everyone,
I'm using a HitCollector and would like to know the
total number of results that matched a given query.
Based on the JavaDoc, I this will do the trick:
Searcher searcher = new IndexSearcher(indexReader);
final BitSet bits = new
BitSet(indexReader.maxDoc());
searcher.s
On Thu, Jun 29, 2006, James Pine wrote about "HitCollector and Sort Objects":
> I have one type of search where I pass in a Query and
> a Sort (built with a SortField and Decompresses) and
> deal with the Hits object, and another which takes a
> Query and a HitCollector, w
Yeah!! There are no methods that you mentioned. But there are some ways to
do this.
TopFieldDocs:search(Query query, Filter filter, int n, Sort sort)
If above method does not solve your purpose, then
My suggestion is to use method search(Query query, Filter filter,
HitCollector results) and
Hey,
I've looked at the documentation for:
org.apache.lucene.search.Searchable
org.apache.lucene.search.Searcher
org.apache.lucene.search.IndexSearcher
and it struck me that there are no search methods with
these signatures:
void search(Query query, Filter filter, HitCollector
results,
>
> On 4/26/06, jm <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I have encountered an issue with lucene1.9.1. It involves
> > MatchAllDocsQuery, MultiSearcher and a custom HitCollector. The
> > following code throws java.lang.UnsupportedOperationException.
&g
Hi,
> >
> > I have encountered an issue with lucene1.9.1. It involves
> > MatchAllDocsQuery, MultiSearcher and a custom HitCollector. The
> > following code throws java.lang.UnsupportedOperationException.
> >
> > If I remove the MatchAllDocsQuery condition (
rcher and a custom HitCollector. The
> following code throws java.lang.UnsupportedOperationException.
>
> If I remove the MatchAllDocsQuery condition (comment whole //1
> block), or if I dont use the custom hitcollector (ms.search(mbq);
> instead of ms.search(mbq, allcoll);) the e
Hi,
I have encountered an issue with lucene1.9.1. It involves
MatchAllDocsQuery, MultiSearcher and a custom HitCollector. The
following code throws java.lang.UnsupportedOperationException.
If I remove the MatchAllDocsQuery condition (comment whole //1
block), or if I dont use the custom
Hello Erik,
Thanks for your info.
It passed !.
Thanks again,
Youngho
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To:
Sent: Thursday, March 09, 2006 5:12 PM
Subject: Re: RangeQuery, FilterdQuery and HitCollector
> Youngho,
>
> Try the
FilteredQuery has the side effect of passing zero scoring docs to the
hitcollector.
This does break the contract for HitCollector.collect method because the
JavaDocs state:
"Called once for every non-zero scoring document, with the document
number and its score."
The quick fix is to si
potential
TooManyClauses exception )
and found out
http://wiki.apache.org/jakarta-lucene/FilteringOptions
wiki said that FilteredQuery is best one.
But Interesting is that
when I used the option with HitCollector , FilteredQuery test is fail.
Am I something missing or FilteredQuery with HitCollector
Hello,
I would like to use a Filter for rangeQuery ( to avoid potential TooManyClauses
exception )
and found out
http://wiki.apache.org/jakarta-lucene/FilteringOptions
wiki said that FilteredQuery is best one.
But Interesting is that
when I used the option with HitCollector , FilteredQuery
Hello,
Can I use HitCollector with RemoteSearchable ?
I am tring to use it. But I got the following error.
java.rmi.MarshalException: error marshalling arguments; nested exception is:
java.io.NotSerializableException:
org.apache.lucene.search.MultiSearcher$1
at
Hi,
I did a quick google search and couldn't find any info on this...
I seem to be having a problem when I try to execute a search using a
HitCollector while the index is being indexed. Does it make sense that I
could be getting this error because the index is being merged while the
HitColl
Hi,
I did a quick google search and couldn't find any info on this...
I seem to be having a problem when I try to execute a search using a
HitCollector while the index is being indexed. Does it make sense that I
could be getting this error because the index is being merged while the
HitColl
91 matches
Mail list logo