I had understood your need . You can use k mean clustering in mahout .
Which can help your you case . You can better post this question in mahout
user list where you get different idea . I had also had use case like this
as i did as POC. But still my suggestion is that . You can post this
question
thank you so much for the prompt reply
I need to extract a document from the index that is similar to an Html
document, and I need to use cosine similarity or latent semantic analysis which
means that I need to generate term vector for the html document, the link you
sent me doesn't contain any
Hi ELshaimaa,
I couldnt able understood what is your need . Can you please explain your
use case.
If this is case "I need to use Lucene to find the most similar documents
from the generated index"
then go for morelikethis[1] components .
Based on your use case people can suggest some good wa
Dear list,
I'm considering to use Lucene for indexing sequences of part-of-speech
(POS) tags instead of words; for those who don't know, POS tags are
linguistically motivated labels that are assigned to tokens (words) to
describe its morpho-syntactic function. Instead of sequences of words, I
would