On 3/29/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hm, removing duplicates (as determined by a value of a specified document
field) from the results would be nice.
How would your addition affect performance, considering it has to check
the PQ for a previous value for every candidate hit?
On 3/29/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hm, removing duplicates (as determined by a value of a specified document
field) from the results would be nice.
How would your addition affect performance, considering it has to check
the PQ for a previous value for every candidate hit?
Peter Keegan wrote:
I implemented 'first wins' because the score is less important than other
fields (distance, in our case), but you make a good point since score
may be
more important. How did you implement remove()?
I've got my own PriorityQueue
public boolean remove(E o)
{
Peter, how did you achieve 'last wins' as you must presumably remove first
from the PQ?
I implemented 'first wins' because the score is less important than other
fields (distance, in our case), but you make a good point since score may be
more important. How did you implement remove()?
Peter
is_maximum wrote:
Hi
suppose we have two Hits, now we need the documents which exists in both of
them and ignore the others.
is there any workaround?
Can you use a BooleanQuery instead?
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280
Melanie Langlois wrote:
Hi,
I'm wondering if lucene would understand such a query:
content*:mysearch
It's just because I index several translations of my document
contents in addition with common fields, and this separation is
really usefull when an user specify the language in which he wants
: Can anyone say why this is useful and what's wrong about raw scores?
in my opinion: it's not useful at all, and there is nothing wrong with
raw scores, but some people like having scores that are bounded in a
finite range, so Hits provides that.
-Hoss
-
: For a given query (for a single input document), the highest score is
: *not* always 1 (which is just how
: I want it). Is this because I am using a Boolean query? Here is my code
: snippet.
the Hits class only normalizes scores if the highest score is greater then
one, if it's less then 1 no no
I've got a similar duplicate case, but my duplicates are based on an external ID
rather than Doc id so occurs for a single Query. It's using a custom
HitCollector but score based, not field sorted.
If my duplicate contains a higher score than one on the PQ I need to update the
stored score wi
Erick Erickson wrote:
I wound up using a TopDocs instead, which has a getMaxScore that
I was able to use to normalize scores to between 0 and 1. In my case
I was collapsing the results into quintiles, so I threw them all
back into a FieldSortedHitQueue to get them sorted by secondary
criteria onc
After much head-scratching and re-reading of posts around this issue, I
found a solution by writing my own QueryTermExtractor. Thanks for the help
everyone.
To recap, I wanted to get more "precise" highlighting by having just the
"right" fields highlighted. My original example was a field named "
Thanks, sounds great.
Sponsored search results are such an important complement
to Lucene, Solr and Nutch it's surprising there is nothing in
the sandbox and only a few archive mentions.
I really liked the idea of setting a boost to certain sponsor
docs based on 'importance' and some known profi
Yes, my custom query processor can sometimes make 2 Lucene search calls
which may result in duplicate docs being inserted on the same PQ. The
simplest solution is to make lessThan public. I'm curious to know if anyone
else is performing multiple searches under the covers.
Peter
On 3/29/07, Yonik
On 3/29/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Ah, I see. This is less attractive to me personally, but maybe it helps
others. One thing I don't understand is why/how you'd get duplicate documents
with the same doc ID in there. Isn't insert(FieldDoc fdoc) called only once
for each
Ah, I see. This is less attractive to me personally, but maybe it helps
others. One thing I don't understand is why/how you'd get duplicate documents
with the same doc ID in there. Isn't insert(FieldDoc fdoc) called only once
for each doc?
Otis
. . . . . . . . . . . . . . . . . . . . . . .
Recent questions about whether/how scores are normalized got me wondering
how
my application (happily) seems to be doing what I want. I have two
indexes, one
which contains text fields which I want to use as queries into text fields
in a second index.
I create a Boolean query based on all the t
The duplicate check would just be on the doc ID. I'm using TreeSet to detect
duplicates with no noticeable affect on performance. The PQ only has to be
checked for a previous value IFF the element about to be inserted is
actually inserted and not dropped because it's less than the least value
alre
Hm, removing duplicates (as determined by a value of a specified document
field) from the results would be nice.
How would your addition affect performance, considering it has to check the PQ
for a previous value for every candidate hit?
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . .
I wound up using a TopDocs instead, which has a getMaxScore that
I was able to use to normalize scores to between 0 and 1. In my case
I was collapsing the results into quintiles, so I threw them all
back into a FieldSortedHitQueue to get them sorted by secondary
criteria once the scores were all o
This is request for an enhancement to FieldSortedHitQueue/PriorityQueue that
would prevent duplicate documents from being inserted, or alternatively,
allow the application to prevent this (reason explained below). I can do
this today by making the 'lessThan' method public and checking the queue
be
hi
i want to integrate the lucene in my web application
my web application consist on several pages integrated into opencms
the problem with lucene that it return the result like this :
java org.apache.lucene.demo.SearchFiles
Query: connect
Searching for: connect
0 total matching documents
Query
is_maximum wrote:
>
> I registered in Nabble, but to post message you should subscribe to lucene
> mailing list and if you subscribe to mailing list your inbox will become
> full of messages. this is very bad!!!
>
Let me clear this misunderstanding, the advantage of using Nabble is that
you don
Hits will normalise scores >0<=1, but I'm using HitCollector and haven't worked
out how to normalise those scores.
From what I can see, the scores are just multiplied by a factor to bring the
top score down to 1. Is this right or is there something more to it.
Do I need to normalise scores a
Mohammad Norouzi wrote:
I registered in Nabble, but to post message you should subscribe to
lucene
mailing list and if you subscribe to mailing list your inbox will become
full of messages. this is very bad!!!
You're using gmail aren't you? Why don't you set up a filter to handle
mail from th
I just want to say, the common documents base on a field and not document id
On 3/29/07, is_maximum <[EMAIL PROTECTED]> wrote:
Hi
suppose we have two Hits, now we need the documents which exists in both
of
them and ignore the others.
is there any workaround?
thanks
Regards
Mohammad
--
View
25 matches
Mail list logo