Computing the cosine between two documents requires that the vectors for
each document to be the same length (same number of elements, same
dimensionality, not the norm). The length of the vector is the length
of the vocabulary for the whole set. The two sets will inevitably have
different nu
Hello Herb. Thank you very much for your reply. I want to have the cosine for
each a and each b. I'm using code for lucene I found online, which I will post
below.
Hello Uwe. Thank you very much for replying. I am using a class DocVector and
then a class in which i try to compute the similariti
Hi Stefy,
the stack trace you posted has nothing to do with Apache Lucene. It looks like
you are using some commons-lang3 classes here, but no Lucene code at all. So I
think your question might be better asked on the commons-math mailing list,
unless you have some Lucene code around, too. If th
If you want to compute the cosines between pairs of documents (each a compared
with each b), then the dimension is 100, the size of each document. If you want
to compare the whole index then you will need to make them the same length
(number of elements) by padding the shorter with zeroes. There