Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of
similarity formula,
but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in
Query and Doc,
so it's a modified implementation of VSM in practice.
Do you just want to verify which implementation of VSM in "
You can find those variants of the vector space model in this interesting
article:
http://ieeexplore.ieee.org/iel1/52/12658/00582976.pdf?tp=&isnumber=&arnumber=582976
Now, I got confirmed with you the current nature of Similarity API's will be
not easy to quickly realize these variants.
Actually
FYI: The mailing list handler strips attachments.
At any rate, sounds like an interesting project. I don't know how
easy it will be for you to implement 7 variants of VSM in Lucene given
the nature of the APIs, but if you do, it might be handy to see your
changes as a patch. :-) Also not
Thanks for your tips. My overall goal is to quickly implement 7 variants of
vector space model using Lucene. You can find these variants in the
updloaded file.
I am doing all these stuffs for a much broader goal: I am trying to recover
traceability links from requirements to source code files. I
On Feb 28, 2008, at 9:00 AM, Dharmalingam wrote:
Thanks for the reply. Sorry if my explanation is not clear. Yes, you
are
correct the model is based on Salton's VSM. However, the
calculation of the
term weight and the doc norm is, in my opinion, different from
Lucene. If
you look at th
Thanks for the reply. Sorry if my explanation is not clear. Yes, you are
correct the model is based on Salton's VSM. However, the calculation of the
term weight and the doc norm is, in my opinion, different from Lucene. If
you look at the table given in
http://www.miislita.com/term-vector/term-ve
Not sure I am understanding what you are asking, but I will give it a
shot. See below
On Feb 26, 2008, at 3:45 PM, Dharmalingam wrote:
Hi List,
I am pretty new to Lucene. Certainly, it is very exciting. I need to
implement a new Similarity class based on the Term Vector Space
Model giv
Hi List,
I am pretty new to Lucene. Certainly, it is very exciting. I need to
implement a new Similarity class based on the Term Vector Space Model given
in http://www.miislita.com/term-vector/term-vector-3.html
Although that model is similar to Lucene’s model
(http://hudson.zones.apache.org/hud