sire to point/click my way through them.
>
> Neal
>
> --
> Wire:nrauhauser
> sms:202-642-1717
> mailto:nrauhau...@gmail.com//
>
--
Dwaipayan Roy.
During indexing, an inverted index is made with the term of the documents
and the term frequency, document frequency etc. are stored. If I know
correctly, the exact document length is not stored in the index to reduce
the size. Instead, a normalized length is stored for each document.
However, for
xt. They can change over time for example.
> >>
> >> But if you do, it sounds like maybe what you are seeing is the per segment
> >> docid. To get a global one you have to add the segment offset, held by a
> >> leaf reader.
> >>
> >> On Mar 9, 2018
mple.
>
> But if you do, it sounds like maybe what you are seeing is the per segment
> docid. To get a global one you have to add the segment offset, held by a
> leaf reader.
>
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:
>
> > While searching, I want to ge
example.>
>
> But if you do, it sounds like maybe what you are seeing is the per
segment>
> docid. To get a global one you have to add the segment offset, held by a>
> leaf reader.>
>
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:>
>
> > While se
While searching, I want to get the lucene assigned docid (that starts from
0 to the number of documents -1) of a document having a particular query
term.
>From inside the score(), printing 'doc' or calling docId() is returning a
docid which, I think, is the internal docid of a segment in which the
Thanks for your replies. But still, I am not sure about the way to do the
thing. Can you please provide me with an example code snippet or, link to
some page where I can find one?
Thanks..
On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy
wrote:
> I want to make a scoring function that w
I want to make a scoring function that will score the documents by the
following function:
given Q = {q1, q2, ... }
score(D,Q) =
for all qi:
SUM of {
LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
}
I have stored weight_1, weight_2 and weight_3 for all term of all docu
Hi,
I want to get the term frequency of a given term t in a given document with
lucene docid say d.
Formally, I need a function say f() that takes two arguments: 1.
lucene-docid d, 2. term t, and returns the number of time t occurs in d.
I know of one solution, that is, traversing the whole docu
Waiting for an explanation for my query. Thank you very much.
On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy
wrote:
> Hello,
>
> Can anyone help me understand the scoring function in the
> LMJelinekMercerSimilarity class?
>
> The scoring function in LMJelinekMercerSimilar
getCollectionProbability() returns col_freq(t) / col_size. Am I
right?
Also the boosting part is not clear to me (stats.getTotalBoost()).
I want to reproduce the result of the scoring using LM-JM. Hence I want the
details.
Thanks.
Dwaipayan Roy..
> return ModelBase.this.score(stats, freq,
> norms == null ? 1L : norms.get(doc));
> }
>
> @Override
> public Explanation explain(int doc, Explanation freq) {
> return ModelBase.this.explain(stats, doc, freq,
> norms == null ? 1L : norms.get(doc));
> }
>
>
&g
Hello,
In *SimilarityBase.java*, I can see that the length of the document is is
getting normalized by using the function *decodeNormValue()*. But I can't
understand how the normalizations is done. Can you please help? Also, is
there any way to avoid this doc-length normalization, to use the raw
Hello.
I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for
the Luke similarity calculation. Luke by default use the DefaultSimilarity.
Can anyone help with this? I use Lucene 4.10.4 and Luke for that version
of Lucene index.
Dwaipayan..
Hello.
I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for
the Luke similarity calculation. Luke by default use the DefaultSimilarity.
Can anyone help with this? I use Lucene 4.10.4 and Luke for that version
of Lucene index.
Dwaipayan
I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
the porter stemmer (snowball) to stem the words. But using the
EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
getting stemmed into 'new'.
Any help would be appreciated.
the paper, I am setting these weights with those
normalized probability values.
Can anyone of you please help me out in this problem?
Thanks,
Dwaipayan Roy.
Research Scholar
Indian Statistical Institute
Kolkata, India
17 matches
Mail list logo