Re: Current command line tools for Lucene?

2024-09-24 Thread Dwaipayan Roy
sire to point/click my way through them. > > Neal > > -- > Wire:nrauhauser > sms:202-642-1717 > mailto:nrauhau...@gmail.com// > -- Dwaipayan Roy.

How exactly the normalized length of the documents are stored in the index

2021-07-13 Thread Dwaipayan Roy
During indexing, an inverted index is made with the term of the documents and the term frequency, document frequency etc. are stored. If I know correctly, the exact document length is not stored in the index to reduce the size. Instead, a normalized length is stored for each document. However, for

Re: getting Lucene Docid from inside score()

2018-03-10 Thread dwaipayan . roy
xt. They can change over time for example. > >> > >> But if you do, it sounds like maybe what you are seeing is the per segment > >> docid. To get a global one you have to add the segment offset, held by a > >> leaf reader. > >> > >> On Mar 9, 2018

Re: getting Lucene Docid from inside score()

2018-03-09 Thread dwaipayan . roy
mple. > > But if you do, it sounds like maybe what you are seeing is the per segment > docid. To get a global one you have to add the segment offset, held by a > leaf reader. > > On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote: > > > While searching, I want to ge

Re: getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
example.> > > But if you do, it sounds like maybe what you are seeing is the per segment> > docid. To get a global one you have to add the segment offset, held by a> > leaf reader.> > > On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:> > > > While se

getting Lucene Docid from inside score()

2018-03-09 Thread Dwaipayan Roy
While searching, I want to get the lucene assigned docid (that starts from 0 to the number of documents -1) of a document having a particular query term. >From inside the score(), printing 'doc' or calling docId() is returning a docid which, I think, is the internal docid of a segment in which the

Re: Custom Similarity

2018-01-27 Thread Dwaipayan Roy
Thanks for your replies. But still, I am not sure about the way to do the thing. Can you please provide me with an example code snippet or, link to some page where I can find one? Thanks.. On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy wrote: > ​I want to make a scoring function that w

Custom Similarity

2018-01-16 Thread Dwaipayan Roy
​I want to make a scoring function that will score the documents by the following function: given Q = {q1, q2, ... } score(D,Q) = for all qi: SUM of { LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) } } I have stored weight_1, weight_2 and weight_3 for all term of all docu

To get the term-freq

2017-11-16 Thread Dwaipayan Roy
​Hi, I want to get the term frequency of a given term t in a given document with lucene docid say d. Formally, I need a function say f() that takes two arguments: 1. lucene-docid d, 2. term t, and returns the number of time t occurs in d. I know of one solution, that is, traversing the whole docu

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy
Waiting for an explanation for my query. Thank you very much. On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy wrote: > Hello, > > Can anyone help me understand the scoring function in the > LMJelinekMercerSimilarity class? > > The scoring function in LMJelinekMercerSimilar

Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy
getCollectionProbability() returns col_freq(t) / col_size. Am I right? Also the boosting part is not clear to me (stats.getTotalBoost()). I want to reproduce the result of the scoring using LM-JM. Hence I want the details. Thanks. Dwaipayan Roy..

Re: Doc length nomalization in Lucene LM

2016-07-22 Thread Dwaipayan Roy
> return ModelBase.this.score(stats, freq, > norms == null ? 1L : norms.get(doc)); > } > > @Override > public Explanation explain(int doc, Explanation freq) { > return ModelBase.this.explain(stats, doc, freq, > norms == null ? 1L : norms.get(doc)); > } > > &g

Doc length nomalization in Lucene LM

2016-07-21 Thread Dwaipayan Roy
​Hello, In *SimilarityBase.java*, I can see that the length of the document is is getting normalized by using the function *decodeNormValue()*. But I can't understand how the normalizations is done. Can you please help? Also, is there any way to avoid this doc-length normalization, to use the raw

Setting LMJelinekMercer Similarity in Luke

2016-07-20 Thread Dwaipayan Roy
​Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can​ anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan.. ​

Re: Problem with porter stemming

2016-07-19 Thread Dwaipayan Roy
​Hello. I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for the Luke similarity calculation. Luke by default use the DefaultSimilarity. Can​ anyone help with this? I use Lucene 4.10.4 and Luke for that version of Lucene index. Dwaipayan

Problem with porter stemming

2016-03-14 Thread Dwaipayan Roy
​I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses the porter stemmer (snowball) to stem the words. But using the EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is getting stemmed into 'new'. Any help would be appreciated.

Query regarding Lucene

2016-03-09 Thread Dwaipayan Roy
the paper, I am setting these weights with those normalized probability values. Can anyone of you please help me out in this problem? Thanks, Dwaipayan Roy. Research Scholar Indian Statistical Institute Kolkata, India