Ahhh, I was assuming you didn't need to look at all clusters.
Oops.
That said, the question is really whether this is "good enough"
compared to re-indexing, and only some tests will determine that.
I was surprised at how quickly a *large* number of ORs was
processed by Lucene.
You could also thin
do you mean to say that we generate a compound query by AND ing the original
query with a query like
( (cluster_id=0)^boost_cluster0 OR (cluster_id=1)^boost_cluster1...) )
But is this not inefficient considering that the number of clusters is in
hundreds ??
On 8/21/07, Erick Erickson <[E
One solution is to keep meta-data in your index. Remember that
documents do not all have to have the same field. So you could
index a document with a single field
"metadatanotafieldinanyotherdoc" that contains, say, a list of
all of your clusters and their boosts. Read this document in at
startup t
Is it possible to have multiple documents share a common boost?
An example scenario is as follows. The set of documents are clustered into
some set of clusters. Each cluster has a unique clusterId. So each document
has a cluster Id field that associates each document with its cluster. Each
cluster