Have you looked at Lucene's "MoreLikeThis"? I confess I haven't worked with this enough to recommend *how* to use it, but it seems like it's in the general area you're talking about.
http://lucene.apache.org/java/3_5_0/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html Best Erick On Fri, Dec 16, 2011 at 12:53 PM, Josh Stone <pacesysj...@gmail.com> wrote: > Thanks for the response Donna. That would make more sense, but the items > I'm pulling in from the web contain large bodies of text (descriptions) > whereas the products in my catalog consist of shorter fields such as > product name, manufacturer, product code, etc. So using the smaller fields > from my catalog to build queries against the larger fields in the items I > pull in seems to be the only way to do things (that I can think of). > > And this brings up my exact problem. I have a document (set of fields) that > I want to use as search criteria for a search against another set of > documents. Can something like this be done? > > Cheers, > Josh > > On Fri, Dec 16, 2011 at 5:02 AM, Donna L Gresh <gr...@us.ibm.com> wrote: > >> Maybe I'm misunderstanding what you're trying to do, but why not do it the >> other >> way around; that is, index the items in your catalog, and use the items on >> the web >> as the query into the catalog. I have an analogous process (though >> completely >> different application area) and I index the stuff that doesn't change >> much, and use the >> things that are constantly changing as the query. >> >> Donna L. Gresh >> Business Analytics and Mathematical Sciences >> IBM T.J. Watson Research Center >> (914) 945-2472 >> https://researcher.ibm.com/researcher/view.php?person=us-gresh >> gr...@us.ibm.com >> >> >> >> >> From: >> Josh Stone <pacesysj...@gmail.com> >> To: >> java-user@lucene.apache.org >> Date: >> 12/15/2011 04:57 PM >> Subject: >> Using Lucene to match document sets to each other >> >> >> >> I have a use case for which I'm trying to figure out the best way to use >> Lucene and could use some guidance. >> >> I have a set of documents representing products in a catalog (name, >> description, etc.). I then pull down data from different sources such as >> Ebay and Amazon and need to determine if the items retrieved from those >> sources match any of the products in the catalog. So I'm essentially >> attempting to take many items and many products and determine where I have >> matches. >> >> I'm not sure the best way to go about this, but one questionable approach >> is to index the items as I pull them in (to RAM) and do one search for >> every product in my catalog, looking for matching names or descriptions. >> This means an almost exponential number of queries though. Is there a >> better approach? Any help is appreciated. >> >> Thanks, >> Josh >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org