Maybe I'm misunderstanding what you're trying to do, but why not do it the other way around; that is, index the items in your catalog, and use the items on the web as the query into the catalog. I have an analogous process (though completely different application area) and I index the stuff that doesn't change much, and use the things that are constantly changing as the query.
Donna L. Gresh Business Analytics and Mathematical Sciences IBM T.J. Watson Research Center (914) 945-2472 https://researcher.ibm.com/researcher/view.php?person=us-gresh gr...@us.ibm.com From: Josh Stone <pacesysj...@gmail.com> To: java-user@lucene.apache.org Date: 12/15/2011 04:57 PM Subject: Using Lucene to match document sets to each other I have a use case for which I'm trying to figure out the best way to use Lucene and could use some guidance. I have a set of documents representing products in a catalog (name, description, etc.). I then pull down data from different sources such as Ebay and Amazon and need to determine if the items retrieved from those sources match any of the products in the catalog. So I'm essentially attempting to take many items and many products and determine where I have matches. I'm not sure the best way to go about this, but one questionable approach is to index the items as I pull them in (to RAM) and do one search for every product in my catalog, looking for matching names or descriptions. This means an almost exponential number of queries though. Is there a better approach? Any help is appreciated. Thanks, Josh