Thanks you for your reply
The thing is i'am trying to emplement a weight for a word form indexing html
web pages.
The is like :
*50% + Weigth(word in doc d) = *20% + * 10% +
...
the code is :
=
doc.add(new Field("url", httpd.
I strongly recommend against this. Simple word counts are a poor
measure of relevance. Which is why Lucene doesn't score that
way. Do you have an example showing why the default scoring is
inadequate or is this just an assumption?
It would be helpful if you gave us some idea of what you're trying
That is already in the similarity formula, in tf term, documents that
have more occurrences of a given term receive a higher score.
Jamal H Tandina wrote:
If you want to give priority to documents that are larger, like z1, you
should change the DefaultSimilarity (at index time), more e
If you want to give priority to documents that are larger, like z1, you
should change the DefaultSimilarity (at index time), more exactly the
method:
public float lengthNorm(String fieldName, int numTerms) {
return (float)(1.0 / Math.sqrt(numTerms));
}
to something like this
p
For your specific problem you need to change the DefaultSimilarity only
at index time, because the lengthNorm is written to the index when is
created.
So... first you'll need to extend the DefaultSimilarity and override the
lengthNorm() method with the one suggested in the previous replay; then
Thank you for your reply
How can i change the defaultSimilarity in the indexing and the searching, do
you have an example or an url how to set the Similarity ?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
Thanks again
Ion Bad
Try too look at Similarity, there you will find thinks about the
scoring. Your query is more "similar" with the shorter document.
If you have 2 documents with a field body; first with words "red flower"
and the second with just one word "flower", and search for the word
"flower", the second docu
There are many factors that go into scoring. Erick gave a nice link that
will help you out.
Also, check out Query.explain(). That will tell you how your score was
resolved.
To give you a start, normally shorter fields are preferred...finding a
keyword in a short title is usually more relevan
What leads you to expect that ordering? Scoring in Lucene is
NOT simply counting the number of times a word appears.
That said, I really have no clue how the scoring algorithm
works since it's always been "good enough for me". But
if you search the mail archive for scoring, you'll find a
wealth of