On 15 Jun 2007, at 19:07, Walt Stoneburner wrote:
Antoine Baudoux writes:
I want to be able to give a score to each collection.
Keep in mind, Lucene is computing a score based on quite a number of
things from how often a term is used in a document, how often it
appears in the collection of documents, how long the query is, etc.
If your concept of a document's score changes, then I'd be inclined to
think you're possibly using Lucene in a manner it wasn't designed for.
That said, I have two thoughts.
THOUGHT ONE
Use Lucene to locate "records" for you --- what you really are
interested in getting back _from Lucene_ is the primary key. Then,
use this key to do a lookup in your database of the score of the day
and sort accordingly. The idea is that Lucene finds, your table
scores, and because of that you won't need to re-index when something
changes.
I think Thats the same principle that drives ValueSourceQuery. The
advantadge of ValueSourceQuery is that the sorting remains in lucene,
where it can benefits of field indexes and Field Caches.
I dont like the idea of having to retrieve, say, 300.000 results
from lucene, then sort them using lookups in DB, then display only a
part of the results on screen. But I may be wrong thinking it would
be slower than a ValueSourceQuery.
THOUGHT TWO
Use boosting. COLLECTION_ONE^5 COLLECTION_THREE^10 etc. That way
/if/ the Lucene document appears in the collection, it's score is
weighted according to your preferences. You're free to change the
boosts on a query-by-query basis without having to re-index.
I can use a Very big ... query ... I am afraid that it will be slow.
Try it. I think you'll find Lucene is _fast_. We do some pretty HUGE
and complicated queries and Lucene just screams.
Indeed, I tried a Boolean query with a custom boost for each of the
300 collections, the speed is correct with a db of 600.000 images.
I can add another field to each document, containing a computed
custom score, then i could sort on that field. But i want to avoid
this solution at all costs, since it would mean re-indexing all the
documents each time the collection scores change.
Or, use indirection - instead of keeping the score, keep the primary
key of a score table. Then in a database, where speed won't be the
issue, perform the look up. Honestly, if you're only got 300
categories, you could keep that simple table in memory using less
space than a small text file.
Again, thats not the space needed to store the lookup table that
annoys me, its the time it would take to fetch all docs unsorted from
lucene, then sort using the table.
I would also like to implement random-sorting. ... Is it a good
solution?
Is there another way to do it?
This really, really, really feels like you're force fitting Lucene to
do some business logic piece of a larger application. May I be so
bold as to ask what's the _actual_ problem you're trying to solve.
("I'm trying to make a hole in a piece of oak" as opposed to "What's
the best way to sharpen a Phillips screwdriver enough to cut wood?")
Keep in mind that the forum is for Lucene, so parts of your questions
may be answered outside of the forum.
Well, since Sorting/scoring is part of lucene's core api, asking for
a way to achieve a particular sorting does not seem to me out of the
scope of this forum/mailing list.
I just need random sorting to give each photographer an equal
opportunity to sell their work.
Thanks for your answers