from:"Dwaipayan Roy"

Query regarding Lucene

2016-03-09 Thread Dwaipayan Roy

the paper, I am setting these weights with those normalized probability values. Can anyone of you please help me out in this problem? Thanks, Dwaipayan Roy. Research Scholar Indian Statistical Institute Kolkata, India

Problem with porter stemming

2016-03-14 Thread Dwaipayan Roy

I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses the porter stemmer (snowball) to stem the words. But using the EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is getting stemmed into 'new'. Any help would be appreciated.

Re: Problem with porter stemming

2016-07-19 Thread Dwaipayan Roy

Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan

Setting LMJelinekMercer Similarity in Luke

2016-07-20 Thread Dwaipayan Roy

Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan..

Doc length nomalization in Lucene LM

2016-07-21 Thread Dwaipayan Roy

Hello, In *SimilarityBase.java*, I can see that the length of the document is is getting normalized by using the function *decodeNormValue()*. But I can't understand how the normalizations is done. Can you please help? Also, is there any way to avoid this doc-length normalization, to use the raw

Re: Doc length nomalization in Lucene LM

2016-07-22 Thread Dwaipayan Roy

> return ModelBase.this.score(stats, freq, > norms == null ? 1L : norms.get(doc)); > } > > @Override > public Explanation explain(int doc, Explanation freq) { > return ModelBase.this.explain(stats, doc, freq, > norms == null ? 1L : norms.get(doc)); > } > > &g

Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy

getCollectionProbability() returns col_freq(t) / col_size. Am I right? Also the boosting part is not clear to me (stats.getTotalBoost()). I want to reproduce the result of the scoring using LM-JM. Hence I want the details. Thanks. Dwaipayan Roy..

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy

Waiting for an explanation for my query. Thank you very much. On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy wrote: > Hello, > > Can anyone help me understand the scoring function in the > LMJelinekMercerSimilarity class? > > The scoring function in LMJelinekMercerSimilar

To get the term-freq

2017-11-16 Thread Dwaipayan Roy

Hi, I want to get the term frequency of a given term t in a given document with lucene docid say d. Formally, I need a function say f() that takes two arguments: 1. lucene-docid d, 2. term t, and returns the number of time t occurs in d. I know of one solution, that is, traversing the whole docu

Custom Similarity

2018-01-16 Thread Dwaipayan Roy

I want to make a scoring function that will score the documents by the following function: given Q = {q1, q2, ... } score(D,Q) = for all qi: SUM of { LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) } } I have stored weight_1, weight_2 and weight_3 for all term of all docu

Re: Custom Similarity

2018-01-27 Thread Dwaipayan Roy

Thanks for your replies. But still, I am not sure about the way to do the thing. Can you please provide me with an example code snippet or, link to some page where I can find one? Thanks.. On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy wrote: > I want to make a scoring function that w

getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy

While searching, I want to get the lucene assigned docid (that starts from 0 to the number of documents -1) of a document having a particular query term. >From inside the score(), printing 'doc' or calling docId() is returning a docid which, I think, is the internal docid of a segment in which the

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy

example.> > > But if you do, it sounds like maybe what you are seeing is the per segment> > docid. To get a global one you have to add the segment offset, held by a> > leaf reader.> > > On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:> > > > While se

Re: getting Lucene Docid from inside score()

2018-03-09 Thread dwaipayan . roy

mple. > > But if you do, it sounds like maybe what you are seeing is the per segment > docid. To get a global one you have to add the segment offset, held by a > leaf reader. > > On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote: > > > While searching, I want to ge

Re: getting Lucene Docid from inside score()

2018-03-10 Thread dwaipayan . roy

xt. They can change over time for example. > >> > >> But if you do, it sounds like maybe what you are seeing is the per segment > >> docid. To get a global one you have to add the segment offset, held by a > >> leaf reader. > >> > >> On Mar 9, 2018

How exactly the normalized length of the documents are stored in the index

2021-07-13 Thread Dwaipayan Roy

During indexing, an inverted index is made with the term of the documents and the term frequency, document frequency etc. are stored. If I know correctly, the exact document length is not stored in the index to reduce the size. Instead, a normalized length is stored for each document. However, for

Re: Current command line tools for Lucene?

2024-09-24 Thread Dwaipayan Roy

sire to point/click my way through them. > > Neal > > -- > Wire:nrauhauser > sms:202-642-1717 > mailto:nrauhau...@gmail.com// > -- Dwaipayan Roy.

Query regarding Lucene

Problem with porter stemming

Re: Problem with porter stemming

Setting LMJelinekMercer Similarity in Luke

Doc length nomalization in Lucene LM

Re: Doc length nomalization in Lucene LM

Explain Scoring function in LMJelinekMercerSimilarity Class

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

To get the term-freq

Custom Similarity

Re: Custom Similarity

getting Lucene Docid from inside score()

Re: getting Lucene Docid from inside score()

Re: getting Lucene Docid from inside score()

Re: getting Lucene Docid from inside score()

How exactly the normalized length of the documents are stored in the index

Re: Current command line tools for Lucene?

17 matches

Site Navigation

Mail list logo

Footer information