Given 2 text documents I want to quantitatively find, how similar they are, with respect to each other. Say, I want to find Cosine Similarity score between any two given documents. I am trying to use Lucene for it (is it good for this purpose?)
This use case is different from querying against a set of documents I am not sure if Lucene provides a direct API to evaluate this score. Siddharth Grant Ingersoll-6 wrote: > > You might also have a look at the MemoryIndex. Question, though, is > what are you hoping to gain from doing a Query against a single > String? Are you doing a FuzzyQuery? You might look at the > SecondString project on SourceForge for doing string comparisons. > > I guess I am a bit confused by your problem statement. Perhaps you > can explain more what you are trying to do at a higher level, as it > sounds like to me you have str1 and str2, so why do you need to inject > an index into the middle of it? > > -Grant > > On Jun 19, 2008, at 8:33 PM, Sangrish wrote: > >> >> I have a use case for comparing two given strings (attached to a >> specific >> field) >> using Lucene and get the similarity scores. >> >> I tried but could not find any built-in way to do so. Hence >> assuming that >> Lucene only compares a Query against Indexed documents, I came up >> with the >> following approach: >> (Let the 2 strings be, str1 and str2 ) >> >> 1) Create an IndexWriter using a RAMDirectory (I don't want to store >> those >> strings on the disk) >> 2) Index str1 and store it >> 3) Search str2 in the index. ( shall the indexWriter be closed >> before you >> search on the index? ) >> 4) Get the similarity score & publish it >> 5) Delete str1 from the index and make the index available for a new >> comparison >> >> Any comments & suggestions on making the process optimal >> >> Siddharth >> >> -- >> View this message in context: >> http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18020806.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18022691.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]