the paper, I am setting these weights with those
normalized probability values.
Can anyone of you please help me out in this problem?
Thanks,
Dwaipayan Roy.
Research Scholar
Indian Statistical Institute
Kolkata, India
I am using EnglishAnalyzer with my own stopword list. EnglishAnalyzer uses
the porter stemmer (snowball) to stem the words. But using the
EnglishAnalyzer, I am getting erroneous result for 'news'. 'news' is
getting stemmed into 'new'.
Any help would be appreciated.
Hello.
I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for
the Luke similarity calculation. Luke by default use the DefaultSimilarity.
Can anyone help with this? I use Lucene 4.10.4 and Luke for that version
of Lucene index.
Dwaipayan
Hello.
I want to set LMJelinekMercer Similarity (with lambda set to, say, 0.6) for
the Luke similarity calculation. Luke by default use the DefaultSimilarity.
Can anyone help with this? I use Lucene 4.10.4 and Luke for that version
of Lucene index.
Dwaipayan..
Hello,
In *SimilarityBase.java*, I can see that the length of the document is is
getting normalized by using the function *decodeNormValue()*. But I can't
understand how the normalizations is done. Can you please help? Also, is
there any way to avoid this doc-length normalization, to use the raw
> return ModelBase.this.score(stats, freq,
> norms == null ? 1L : norms.get(doc));
> }
>
> @Override
> public Explanation explain(int doc, Explanation freq) {
> return ModelBase.this.explain(stats, doc, freq,
> norms == null ? 1L : norms.get(doc));
> }
>
>
&g
getCollectionProbability() returns col_freq(t) / col_size. Am I
right?
Also the boosting part is not clear to me (stats.getTotalBoost()).
I want to reproduce the result of the scoring using LM-JM. Hence I want the
details.
Thanks.
Dwaipayan Roy..
Waiting for an explanation for my query. Thank you very much.
On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy
wrote:
> Hello,
>
> Can anyone help me understand the scoring function in the
> LMJelinekMercerSimilarity class?
>
> The scoring function in LMJelinekMercerSimilar
Hi,
I want to get the term frequency of a given term t in a given document with
lucene docid say d.
Formally, I need a function say f() that takes two arguments: 1.
lucene-docid d, 2. term t, and returns the number of time t occurs in d.
I know of one solution, that is, traversing the whole docu
I want to make a scoring function that will score the documents by the
following function:
given Q = {q1, q2, ... }
score(D,Q) =
for all qi:
SUM of {
LOG { weight_1(qi) + weight_2(qi) + weight_3(qi) }
}
I have stored weight_1, weight_2 and weight_3 for all term of all docu
Thanks for your replies. But still, I am not sure about the way to do the
thing. Can you please provide me with an example code snippet or, link to
some page where I can find one?
Thanks..
On Tue, Jan 16, 2018 at 3:28 PM, Dwaipayan Roy
wrote:
> I want to make a scoring function that w
While searching, I want to get the lucene assigned docid (that starts from
0 to the number of documents -1) of a document having a particular query
term.
>From inside the score(), printing 'doc' or calling docId() is returning a
docid which, I think, is the internal docid of a segment in which the
example.>
>
> But if you do, it sounds like maybe what you are seeing is the per
segment>
> docid. To get a global one you have to add the segment offset, held by a>
> leaf reader.>
>
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:>
>
> > While se
mple.
>
> But if you do, it sounds like maybe what you are seeing is the per segment
> docid. To get a global one you have to add the segment offset, held by a
> leaf reader.
>
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" wrote:
>
> > While searching, I want to ge
xt. They can change over time for example.
> >>
> >> But if you do, it sounds like maybe what you are seeing is the per segment
> >> docid. To get a global one you have to add the segment offset, held by a
> >> leaf reader.
> >>
> >> On Mar 9, 2018
During indexing, an inverted index is made with the term of the documents
and the term frequency, document frequency etc. are stored. If I know
correctly, the exact document length is not stored in the index to reduce
the size. Instead, a normalized length is stored for each document.
However, for
sire to point/click my way through them.
>
> Neal
>
> --
> Wire:nrauhauser
> sms:202-642-1717
> mailto:nrauhau...@gmail.com//
>
--
Dwaipayan Roy.
17 matches
Mail list logo