Re: Tf and Df in lucene

2015-06-15 Thread Shay Hummel
Erick and Ahmet - thank you Shay On Mon, Jun 15, 2015 at 6:19 PM Ahmet Arslan wrote: > Hi, > > If you are interested in summed up tf values of multiple terms, > I suggest to extend SimilarityBase class to return raw freq as score. > > float score(BasicStats stats, float freq, float docLen){ > r

Re: Tf and Df in lucene

2015-06-15 Thread Ahmet Arslan
Hi, If you are interested in summed up tf values of multiple terms, I suggest to extend SimilarityBase class to return raw freq as score. float score(BasicStats stats, float freq, float docLen){ return freq; } When you use this similarity, search for three term query, scores will summed tf val

Re: Tf and Df in lucene

2015-06-15 Thread Erick Erickson
In a word, no. Terms are, by definition, whatever a "token" is. Tokens are delimited by, say, the WhitespaceTokenizer so a-priori can't do what you want. Unless... you do "something special". In this case, "something special" would be put shingles (See ShingleFilter in Lucene or ShingleFilterFacto

Re: Tf and Df in lucene

2015-06-15 Thread Shay Hummel
Hi Ahmet Thank you for the reply. Can the term reflect a multi word expression? For example: I want to find the term frequency \ document frequency of "united states" (two terms) or "free speech zones" (three terms). Shay On Mon, Jun 15, 2015 at 4:55 PM Ahmet Arslan wrote: > Hi Hummel, > > reg

Re: Tf and Df in lucene

2015-06-15 Thread Ahmet Arslan
Hi Hummel, regarding df, Term term = new Term(field, word); TermStatistics termStatistics = searcher.termStatistics(term, TermContext.build(reader.getContext(), term)); System.out.println(query + "\t totalTermFreq \t " + termStatistics.totalTermFreq()); System.out.println(query + "\t docFreq \t