Hi Ian,
The other question I had was about the quality the results (especially in
the top ranks). But then I utilized the "explain" functionality of Lucene
and observed how the tf/idf parameters are functioning.
I would be interested in seeing any work which modified the "similarity"
function in
thanks Ian for your response. This is a one-time offline program so am not
bothered about the performance (i.e. speed etc.).
one more question, there are some situations where I need to run a AND
clause (i.e. more than one phrase, such as "Apple" AND "Steve Jobs"). My
approach was something like :
Hi Group,
I am indexing and searching a large corpus of news articles. The indexing
process is very straightforward, I am utilizing the standardAnalyzer and
analyzing the content of the news document.
**
document = new Document();
document.add(new Field("snum", snum, Field.