Query regarding Lucene

2016-03-09 Thread Dwaipayan Roy
the paper, I am setting these weights with those normalized probability values. Can anyone of you please help me out in this problem? Thanks, Dwaipayan Roy. Research Scholar Indian Statistical Institute Kolkata, India

Problem with porter stemming

2016-03-14 Thread Dwaipayan Roy
​I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses the porter stemmer (snowball) to stem the words. But using the EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is getting stemmed into 'new'. Any help would be appreciated.

Re: Problem with porter stemming

2016-07-19 Thread Dwaipayan Roy
​Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can​ anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan

Setting LMJelinekMercer Similarity in Luke

2016-07-20 Thread Dwaipayan Roy
​Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can​ anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan.. ​

Doc length nomalization in Lucene LM

2016-07-21 Thread Dwaipayan Roy
​Hello, In *SimilarityBase.java*, I can see that the length of the document is is getting normalized by using the function *decodeNormValue()*. But I can't understand how the normalizations is done. Can you please help? Also, is there any way to avoid this doc-length normalization, to use the raw

Re: Doc length nomalization in Lucene LM

2016-07-22 Thread Dwaipayan Roy
> return ModelBase.this.score(stats, freq, > norms == null ? 1L : norms.get(doc)); > } > > @Override > public Explanation explain(int doc, Explanation freq) { > return ModelBase.this.explain(stats, doc, freq, > norms == null ? 1L : norms.get(doc)); > } > > &g

Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy
getCollectionProbability() returns col_freq(t) / col_size. Am I right? Also the boosting part is not clear to me (stats.getTotalBoost()). I want to reproduce the result of the scoring using LM-JM. Hence I want the details. Thanks. Dwaipayan Roy..

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy
Waiting for an explanation for my query. Thank you very much. On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy wrote: > Hello, > > Can anyone help me understand the scoring function in the > LMJelinekMercerSimilarity class? > > The scoring function in LMJelinekMercerSimilar

To get the term-freq

2017-11-16 Thread Dwaipayan Roy
​Hi, I want to get the term frequency of a given term t in a given document with lucene docid say d. Formally, I need a function say f() that takes two arguments: 1. lucene-docid d, 2. term t, and returns the number of time t occurs in d. I know of one solution, that is, traversing the whole docu

Custom Similarity

2018-01-16 Thread Dwaipayan Roy
​I want to make a scoring function that will score the documents by the following function: given Q = {q1, q2, ... } score(D,Q) = for all qi: SUM of { LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) } } I have stored weight_1, weight_2 and weight_3 for all term of all docu

Re: Custom Similarity

2018-01-27 Thread Dwaipayan Roy
Thanks for your replies. But still, I am not sure about the way to do the thing. Can you please provide me with an example code snippet or, link to some page where I can find one? Thanks.. On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy wrote: > ​I want to make a scoring function that w

getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
While searching, I want to get the lucene assigned docid (that starts from 0 to the number of documents -1) of a document having a particular query term. >From inside the score(), printing 'doc' or calling docId() is returning a docid which, I think, is the internal docid of a segment in which the

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
example.> > > But if you do, it sounds like maybe what you are seeing is the per segment> > docid. To get a global one you have to add the segment offset, held by a> > leaf reader.> > > On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:> > > > While se

Re: getting Lucene Docid from inside score()

2018-03-09 Thread dwaipayan . roy
mple. > > But if you do, it sounds like maybe what you are seeing is the per segment > docid. To get a global one you have to add the segment offset, held by a > leaf reader. > > On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote: > > > While searching, I want to ge

Re: getting Lucene Docid from inside score()

2018-03-10 Thread dwaipayan . roy
xt. They can change over time for example. > >> > >> But if you do, it sounds like maybe what you are seeing is the per segment > >> docid. To get a global one you have to add the segment offset, held by a > >> leaf reader. > >> > >> On Mar 9, 2018

How exactly the normalized length of the documents are stored in the index

2021-07-13 Thread Dwaipayan Roy
During indexing, an inverted index is made with the term of the documents and the term frequency, document frequency etc. are stored. If I know correctly, the exact document length is not stored in the index to reduce the size. Instead, a normalized length is stored for each document. However, for

Re: Current command line tools for Lucene?

2024-09-24 Thread Dwaipayan Roy
sire to point/click my way through them. > > Neal > > -- > Wire:nrauhauser > sms:202-642-1717 > mailto:nrauhau...@gmail.com// > -- Dwaipayan Roy.