The problem is that i want lucene to do the sorting, because the query qould return thousands of results, and I'm displaying documents one page at a time.
--
Antoine Baudoux
Development Manager
[EMAIL PROTECTED]
Tél.: +32 2 333 58 44
GSM: +32 499 534 538
Fax.: +32 2 648 16 53


On 15 Jun 2007, at 17:42, Mathieu Lecarme wrote:

First step is to feed a Set with "collection"
Second step is to sort it.

With a sortedSet, you can do that, isnt'it?

M.


Antoine Baudoux a écrit :
Could-you be more precise? I dont understand what you mean.



On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote:

Your request seems to be a two steps query.
First step, you select image, and then collection
Second step, you sort collection.

BitVector can help you?

M.
Antoine Baudoux a écrit :
    Hi,

    I'm developping an image database. Each lucene document
representing an image contains (among other fields ):

    - a date field
- a collection field containing the ID of the collection the image
belongs to.

I want to be able to give a score to each collection. Collections
with a higher score appear first in the results. I want to avoid
re-indexing all the documents each time i change my collection scores.

For example on day 1 I decide to give collection #1 a 5 score and collection #3 a 10 score --> images belonging to collection #3 appear
first in search results.
One day 2 i give collection #3 a 2 score --> images belonging to
collection #1 appear first in search results.

I have read the lucene docs, and from what i understand there are
many ways to achieve what I want :


- I can use a Very big Boolean query (OR query in fact) containing one TermQuery per collection ID, setting the correct boost factor for each termquery. The problem with this is that i have 300 collections, so i have a boolean query with 300 terms that i append to each query i
make. I am afraid that it will be slow.

- I can use a ValueSourceQuery, where for each document i compute a custom score based on the value of the collection field. Will it be
faster than the first solution?

- I can do advanced things such as writing a custom HitCollector,
or a custom Query.

- I can add another field to each document, containing a computed
custom score, then i could sort on that field. But i want to avoid
this solution at all costs, since it would mean re-indexing all the
documents each time the collection scores change.

    What solution do you suggest?  Is there another solution that i
didnt mention?

More recent documents should also come first : In fact the final sorting should be a ponderated sum between the collection score of an
image and the date of an image : most recent images from the
best-scored collections come first, then most recent from less- scrored
collections, then less recent from best scored, and so on. I would
also like to be able to adjust the balance between date/collection
score.

    What solution do you suggest?


I would also like to implement random-sorting. My solution is : i
create 12 new fields R1 -> R12 for each document, each containing a
random number between 1 and 12. To get a random sort, i sort each day
with a different combination of R1 .. R12. For example :

    Day 1 : i sort by R1 then R4 then R5..
    Day 2 : i sort by R10 then R9 then R2....
    etc...

Is it a good solution? Is there another way to do it?


    Very big thx in advance for your answers.

Antoine

------------------------------------------------------------------- --
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-------------------------------------------------------------------- -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to