Re: Scoring function in LMDirichletSimilarity Class

2013-04-04 Thread Peter Organisciak
I think this is the problem that you're running into, though maybe a person with more expertise can confirm... ZP, If you look at section 5.1 of the Zhai Lafferty paper ( http://www.cs.cmu.edu/~lafferty/pub/smooth-tois.ps), they note that the "term weight is log(1+(1-\lambda)p_ml(q_i|d) / \lamdba

Total Freq for Bigrams, Trigrams, etc.

2014-12-02 Thread Peter Organisciak
It is possible to get a total corpus frequency for bigram queries or higher? i.e. How many times does the query occur in the corpus. I'm looking to implement a count of occurrences per million terms. I know for a single term I can use `TermsEnum.totalTermFreq()`, is there any comparable way to do