Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread h t
Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of similarity formula, but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in Query and Doc, so it's a modified implementation of VSM in practice. Do you just want to verify which implementation of VSM in "

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Dharmalingam
You can find those variants of the vector space model in this interesting article: http://ieeexplore.ieee.org/iel1/52/12658/00582976.pdf?tp=&isnumber=&arnumber=582976 Now, I got confirmed with you the current nature of Similarity API's will be not easy to quickly realize these variants. Actually

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Grant Ingersoll
FYI: The mailing list handler strips attachments. At any rate, sounds like an interesting project. I don't know how easy it will be for you to implement 7 variants of VSM in Lucene given the nature of the APIs, but if you do, it might be handy to see your changes as a patch. :-) Also not

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Dharmalingam
Thanks for your tips. My overall goal is to quickly implement 7 variants of vector space model using Lucene. You can find these variants in the updloaded file. I am doing all these stuffs for a much broader goal: I am trying to recover traceability links from requirements to source code files. I

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Grant Ingersoll
On Feb 28, 2008, at 9:00 AM, Dharmalingam wrote: Thanks for the reply. Sorry if my explanation is not clear. Yes, you are correct the model is based on Salton's VSM. However, the calculation of the term weight and the doc norm is, in my opinion, different from Lucene. If you look at th

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Dharmalingam
Thanks for the reply. Sorry if my explanation is not clear. Yes, you are correct the model is based on Salton's VSM. However, the calculation of the term weight and the doc norm is, in my opinion, different from Lucene. If you look at the table given in http://www.miislita.com/term-vector/term-ve

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Grant Ingersoll
Not sure I am understanding what you are asking, but I will give it a shot. See below On Feb 26, 2008, at 3:45 PM, Dharmalingam wrote: Hi List, I am pretty new to Lucene. Certainly, it is very exciting. I need to implement a new Similarity class based on the Term Vector Space Model giv

Vector Space Model: New Similarity Implementation Issues

2008-02-26 Thread Dharmalingam
Hi List, I am pretty new to Lucene. Certainly, it is very exciting. I need to implement a new Similarity class based on the Term Vector Space Model given in http://www.miislita.com/term-vector/term-vector-3.html Although that model is similar to Lucene’s model (http://hudson.zones.apache.org/hud