On Aug 23, 2006, at 8:30 AM, sachin wrote:
Hello Great/smart guys
This is my first question for this group as I started
working on the Lucene last month.
Lucene provide the scoring of documents based on TF-IDF
vector analysis. Lucene also provides the Scorer and Weight inside
the Search package. By implementing new type of tuple
(Query,Weight,Scorer) I can easily implement new Scoring technique.
Unfortunatly Lucene index shows that it stores only TF / Position
vectors for each term within document.
I am interested in investigating new scoring technique
where I will use some other parameters relating to the Term to rank
the documents. For an example web page ranking is assisted by
parameters like number of links towards webpage and number of link
from web – page. It indicates that we need to store relatively
more information about terms within the index. But HoW ? … I need
to investigate
People are working on this. Search the java-dev archives for
Flexible Indexing or Payloads. See http://issues.apache.org/jira/
browse/LUCENE-662 for a possible patch. Note that the patch is not
committed yet (you can be one of the first to test it!)
Another parameter is relevance feedback from the User.
Ranking should get affected by relevance feedback from the user.
Take a look at Term Vectors. Search the list. Read about them at
http://www.cnlp.org/apachecon2005 or in "Lucene In Action". There is
also a contribution called "More Like This" that you may find useful
Would someone interested in helping out or thinking about the same
problem.
--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886