For the first strategy i'm using MoreLikeThis to generate one query (from
Doc terms) for each analyzed field (from type1 and type2), applying boosts
and searching with TermsFilter to select only documents of type2.

For the second I construct an map <termString, boost> where boost is the
tf-idf of Doc (using searcher and similarity). I failed in using this map
to construct an query because I'm finding something like TermQuery("*",
termStr), or building one TermQuery by field by termStr is ok?

Sorry if i'm not sufficiently explicit about what I mean, I'm on basic
level English course.

Pedro Lacerda



2012/1/26 Pedro Lacerda <pslace...@gmail.com>

> Hi list,
>
> We have two different document types with different fields each. My
> problem is given one document (Doc) from type1, find similar ones of type2.
> Initially I thought two strategies to do it:
>
>    - index all documents together; build my query with terms from Doc and
>    fields of type2; and filter out documents of type1.
>    - index type1 and type2 documents separately; compute scores (like
>    tf-idf) for each term of Doc on type1 index; build my query with terms from
>    Doc and apply the scores as boosts; search on type2 index.
>
> I hope you have nice suggestions for me, because I started to learn Lucene
> but she is giving me a lot of headache!
>
> Pedro Lacerda
>
>

Reply via email to