Erick and Ahmet - thank you
Shay
On Mon, Jun 15, 2015 at 6:19 PM Ahmet Arslan
wrote:
> Hi,
>
> If you are interested in summed up tf values of multiple terms,
> I suggest to extend SimilarityBase class to return raw freq as score.
>
> float score(BasicStats stats, float freq, float docLen){
> r
Hi,
If you are interested in summed up tf values of multiple terms,
I suggest to extend SimilarityBase class to return raw freq as score.
float score(BasicStats stats, float freq, float docLen){
return freq;
}
When you use this similarity, search for three term query, scores will summed
tf val
In a word, no. Terms are, by definition, whatever a "token" is.
Tokens are delimited by, say, the WhitespaceTokenizer
so a-priori can't do what you want.
Unless... you do "something special". In this case, "something special"
would be put shingles (See ShingleFilter in Lucene or
ShingleFilterFacto
Hi Ahmet
Thank you for the reply.
Can the term reflect a multi word expression?
For example:
I want to find the term frequency \ document frequency of "united states"
(two terms) or "free speech zones" (three terms).
Shay
On Mon, Jun 15, 2015 at 4:55 PM Ahmet Arslan
wrote:
> Hi Hummel,
>
> reg
Hi Hummel,
regarding df,
Term term = new Term(field, word);
TermStatistics termStatistics = searcher.termStatistics(term,
TermContext.build(reader.getContext(), term));
System.out.println(query + "\t totalTermFreq \t " +
termStatistics.totalTermFreq());
System.out.println(query + "\t docFreq \t