Identifying the most relevant document

Vishnu Tue, 29 Apr 2014 21:40:24 -0700

I am trying to solve the following search problem. Say we have 10 different
documents d1..d10 Each document contains a type of data say, d1 -> list of
movie names, d2 -> list of actor names, d3 -> list of addresses etc. Each
document contains list of entities and scores. So d1 contains movie names
and their popularity etc. Assume the scores are all normalized(0-max_score
across the documents)


Now given a search query(phrase), I want to score the 10 documents based on
how relevant is is to the search phrase.

My question is if using lucene is a good way to approach this? I plan to
index each phrase with its score into separate document inside lucene and
then query for the top match.

I don't want to search for the individual entities. I am okay with getting
the over all score of entity type for a given search phrase. For example if
some one searches of lord of the rings, I need to be able to say that it is
most likely a movie and not a actor or address. My goal is minimize space
consumption and optimize performance

Identifying the most relevant document

Reply via email to