For the first strategy i'm using MoreLikeThis to generate one query (from Doc terms) for each analyzed field (from type1 and type2), applying boosts and searching with TermsFilter to select only documents of type2.
For the second I construct an map <termString, boost> where boost is the tf-idf of Doc (using searcher and similarity). I failed in using this map to construct an query because I'm finding something like TermQuery("*", termStr), or building one TermQuery by field by termStr is ok? Sorry if i'm not sufficiently explicit about what I mean, I'm on basic level English course. Pedro Lacerda 2012/1/26 Pedro Lacerda <pslace...@gmail.com> > Hi list, > > We have two different document types with different fields each. My > problem is given one document (Doc) from type1, find similar ones of type2. > Initially I thought two strategies to do it: > > - index all documents together; build my query with terms from Doc and > fields of type2; and filter out documents of type1. > - index type1 and type2 documents separately; compute scores (like > tf-idf) for each term of Doc on type1 index; build my query with terms from > Doc and apply the scores as boosts; search on type2 index. > > I hope you have nice suggestions for me, because I started to learn Lucene > but she is giving me a lot of headache! > > Pedro Lacerda > >