Re: Multiple Documents sharing a common boost

2007-08-21 Thread Erick Erickson
Ahhh, I was assuming you didn't need to look at all clusters. Oops. That said, the question is really whether this is "good enough" compared to re-indexing, and only some tests will determine that. I was surprised at how quickly a *large* number of ORs was processed by Lucene. You could also thin

Re: Multiple Documents sharing a common boost

2007-08-21 Thread Raghu Ram
do you mean to say that we generate a compound query by AND ing the original query with a query like ( (cluster_id=0)^boost_cluster0 OR (cluster_id=1)^boost_cluster1...) ) But is this not inefficient considering that the number of clusters is in hundreds ?? On 8/21/07, Erick Erickson <[E

Re: Multiple Documents sharing a common boost

2007-08-21 Thread Erick Erickson
One solution is to keep meta-data in your index. Remember that documents do not all have to have the same field. So you could index a document with a single field "metadatanotafieldinanyotherdoc" that contains, say, a list of all of your clusters and their boosts. Read this document in at startup t

Multiple Documents sharing a common boost

2007-08-21 Thread Raghu Ram
Is it possible to have multiple documents share a common boost? An example scenario is as follows. The set of documents are clustered into some set of clusters. Each cluster has a unique clusterId. So each document has a cluster Id field that associates each document with its cluster. Each cluster