Hi Shay, I suggest you to extend o.a.l.search.similarities.SimilarityBase. All you need to implement a score() method. After all fancy names (language models, etc), a similarity is a function of seven salient statistics. It is actually six: avgFieldLength can derived from other two (numberOfFieldTokens divided by numberOfDocuments)
Seven Statistics come from, Corpus statistics : numberOfDocuments, numberOfFieldTokens, avgFieldLength Term statistics: totalTermFreq and docFreq About the document being scored : within document term frequency (freq) and document length (docLen) If you can express your ranking method in terms of these seven variables, you are ready to go. For example my Dirichlet LM model implementation is nothing but : return log2(1 + (tf / (c * (termFrequency / numberOfTokens)))) + log2(c / (docLength + c)); If you need additional statistics, number of unique terms in a document for example, you need to calculate it by your self and embed it to the index (possibly using DocValues). During scoring, you can retrieve it. Personally I wondered about your similarity, If possible please let community know about its effectiveness. Please also see Robert's write-up : http://lucidworks.com/blog/2011/09/12/flexible-ranking-in-lucene-4/ Thanks, Ahmet On Sunday, December 13, 2015 6:28 PM, will martin <wmartin...@gmail.com> wrote: Sorry it was early. If you go looking on the web, you can find, as I did reputable work on implementing DiricletLanguage Models. However, at this hour you might get answers here. Extrapolating others work into a lucene implantation is only slightly different from getting answers here. imo g'luck > On Dec 13, 2015, at 10:55 AM, Shay Hummel <shay.hum...@gmail.com> wrote: > > Hi > > I am sorry but I didn't understand your answer. Can you please elaborate? > > Shay > > On Sun, Dec 13, 2015 at 3:41 PM will martin <wmartin...@gmail.com> wrote: > >> expand your due diligence beyond wikipedia: >> i.e. >> >> http://ciir.cs.umass.edu/pubfiles/ir-464.pdf >> >> >> >>> On Dec 13, 2015, at 8:30 AM, Shay Hummel <shay.hum...@gmail.com> wrote: >>> >>> LMDiricletbut its feasibilit >> > -- > Regards, > Shay Hummel --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org