Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Tom Hill
On 3/29/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hm, removing duplicates (as determined by a value of a specified document field) from the results would be nice. How would your addition affect performance, considering it has to check the PQ for a previous value for every candidate hit?

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Tom Hill
On 3/29/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hm, removing duplicates (as determined by a value of a specified document field) from the results would be nice. How would your addition affect performance, considering it has to check the PQ for a previous value for every candidate hit?

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Antony Bowesman
Peter Keegan wrote: I implemented 'first wins' because the score is less important than other fields (distance, in our case), but you make a good point since score may be more important. How did you implement remove()? I've got my own PriorityQueue public boolean remove(E o) {

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
Peter, how did you achieve 'last wins' as you must presumably remove first from the PQ? I implemented 'first wins' because the score is less important than other fields (distance, in our case), but you make a good point since score may be more important. How did you implement remove()? Peter

Re: How to get the intersection of two Hits?

2007-03-29 Thread Daniel Noll
is_maximum wrote: Hi suppose we have two Hits, now we need the documents which exists in both of them and ignore the others. is there any workaround? Can you use a BooleanQuery instead? Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280

Re: search on multiple fields

2007-03-29 Thread Daniel Noll
Melanie Langlois wrote: Hi, I'm wondering if lucene would understand such a query: content*:mysearch It's just because I index several translations of my document contents in addition with common fields, and this separation is really usefull when an user specify the language in which he wants

Re: Scores from HitCollector

2007-03-29 Thread Chris Hostetter
: Can anyone say why this is useful and what's wrong about raw scores? in my opinion: it's not useful at all, and there is nothing wrong with raw scores, but some people like having scores that are bounded in a finite range, so Hits provides that. -Hoss -

Re: normalized scores

2007-03-29 Thread Chris Hostetter
: For a given query (for a single input document), the highest score is : *not* always 1 (which is just how : I want it). Is this because I am using a Boolean query? Here is my code : snippet. the Hits class only normalizes scores if the highest score is greater then one, if it's less then 1 no no

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Antony Bowesman
I've got a similar duplicate case, but my duplicates are based on an external ID rather than Doc id so occurs for a single Query. It's using a custom HitCollector but score based, not field sorted. If my duplicate contains a higher score than one on the PQ I need to update the stored score wi

Re: Scores from HitCollector

2007-03-29 Thread Antony Bowesman
Erick Erickson wrote: I wound up using a TopDocs instead, which has a getMaxScore that I was able to use to normalize scores to between 0 and 1. In my case I was collapsing the results into quintiles, so I threw them all back into a FieldSortedHitQueue to get them sorted by secondary criteria onc

RE: More Precise Highlighting (MY SOLUTION)

2007-03-29 Thread Renaud Waldura
After much head-scratching and re-reading of posts around this issue, I found a solution by writing my own QueryTermExtractor. Thanks for the help everyone. To recap, I wanted to get more "precise" highlighting by having just the "right" fields highlighted. My original example was a field named "

Re: Contextual text-link ads

2007-03-29 Thread Peter W.
Thanks, sounds great. Sponsored search results are such an important complement to Lucene, Solr and Nutch it's surprising there is nothing in the sandbox and only a few archive mentions. I really liked the idea of setting a boost to certain sponsor docs based on 'importance' and some known profi

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
Yes, my custom query processor can sometimes make 2 Lucene search calls which may result in duplicate docs being inserted on the same PQ. The simplest solution is to make lessThan public. I'm curious to know if anyone else is performing multiple searches under the covers. Peter On 3/29/07, Yonik

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Yonik Seeley
On 3/29/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Ah, I see. This is less attractive to me personally, but maybe it helps others. One thing I don't understand is why/how you'd get duplicate documents with the same doc ID in there. Isn't insert(FieldDoc fdoc) called only once for each

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Otis Gospodnetic
Ah, I see. This is less attractive to me personally, but maybe it helps others. One thing I don't understand is why/how you'd get duplicate documents with the same doc ID in there. Isn't insert(FieldDoc fdoc) called only once for each doc? Otis . . . . . . . . . . . . . . . . . . . . . . .

normalized scores

2007-03-29 Thread Donna L Gresh
Recent questions about whether/how scores are normalized got me wondering how my application (happily) seems to be doing what I want. I have two indexes, one which contains text fields which I want to use as queries into text fields in a second index. I create a Boolean query based on all the t

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
The duplicate check would just be on the doc ID. I'm using TreeSet to detect duplicates with no noticeable affect on performance. The PQ only has to be checked for a previous value IFF the element about to be inserted is actually inserted and not dropped because it's less than the least value alre

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Otis Gospodnetic
Hm, removing duplicates (as determined by a value of a specified document field) from the results would be nice. How would your addition affect performance, considering it has to check the PQ for a previous value for every candidate hit? Otis . . . . . . . . . . . . . . . . . . . . . . . . . .

Re: Scores from HitCollector

2007-03-29 Thread Erick Erickson
I wound up using a TopDocs instead, which has a getMaxScore that I was able to use to normalize scores to between 0 and 1. In my case I was collapsing the results into quintiles, so I threw them all back into a FieldSortedHitQueue to get them sorted by secondary criteria once the scores were all o

FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
This is request for an enhancement to FieldSortedHitQueue/PriorityQueue that would prevent duplicate documents from being inserted, or alternatively, allow the application to prevent this (reason explained below). I can do this today by making the 'lessThan' method public and checking the queue be

searching beginning from a web page not from a directory

2007-03-29 Thread mohamed hadj taieb
hi i want to integrate the lucene in my web application my web application consist on several pages integrated into opencms the problem with lucene that it return the result like this : java org.apache.lucene.demo.SearchFiles Query: connect Searching for: connect 0 total matching documents Query

Re: why Apache doesnt create a nice forum like the others???

2007-03-29 Thread tomi
is_maximum wrote: > > I registered in Nabble, but to post message you should subscribe to lucene > mailing list and if you subscribe to mailing list your inbox will become > full of messages. this is very bad!!! > Let me clear this misunderstanding, the advantage of using Nabble is that you don

Scores from HitCollector

2007-03-29 Thread Antony Bowesman
Hits will normalise scores >0<=1, but I'm using HitCollector and haven't worked out how to normalise those scores. From what I can see, the scores are just multiplied by a factor to bring the top score down to 1. Is this right or is there something more to it. Do I need to normalise scores a

Re: why Apache doesnt create a nice forum like the others???

2007-03-29 Thread John Haxby
Mohammad Norouzi wrote: I registered in Nabble, but to post message you should subscribe to lucene mailing list and if you subscribe to mailing list your inbox will become full of messages. this is very bad!!! You're using gmail aren't you? Why don't you set up a filter to handle mail from th

Re: How to get the intersection of two Hits?

2007-03-29 Thread Mohammad Norouzi
I just want to say, the common documents base on a field and not document id On 3/29/07, is_maximum <[EMAIL PROTECTED]> wrote: Hi suppose we have two Hits, now we need the documents which exists in both of them and ignore the others. is there any workaround? thanks Regards Mohammad -- View