Given 2 text documents I want to quantitatively find, how similar they are,
with respect to each other. Say, I want to find Cosine Similarity score
between any two given documents. I am trying to use Lucene for it (is it
good for this purpose?)

This use case is different from querying against a set of documents

I am not sure if Lucene provides a direct API to evaluate this score.

Siddharth





Grant Ingersoll-6 wrote:
> 
> You might also have a look at the MemoryIndex.  Question, though, is  
> what are you hoping to gain from doing a Query against a single  
> String?  Are you doing a FuzzyQuery?  You might look at the  
> SecondString project on SourceForge for doing string comparisons.
> 
> I guess I am a bit confused by your problem statement.  Perhaps you  
> can explain more what you are trying to do at a higher level, as it  
> sounds like to me you have str1 and str2, so why do you need to inject  
> an index into the middle of it?
> 
> -Grant
> 
> On Jun 19, 2008, at 8:33 PM, Sangrish wrote:
> 
>>
>> I have a use case for comparing two given strings (attached to a  
>> specific
>> field)
>> using Lucene and get the similarity scores.
>>
>>  I tried but could not find any built-in way to do so. Hence  
>> assuming that
>> Lucene only compares a Query against Indexed documents, I came up  
>> with the
>> following approach:
>> (Let the 2 strings be, str1 and str2 )
>>
>> 1) Create an IndexWriter using a RAMDirectory (I don't want to store  
>> those
>> strings on the disk)
>> 2) Index str1 and store it
>> 3) Search str2 in the index. ( shall the indexWriter be closed  
>> before you
>> search on the index? )
>> 4) Get the similarity score & publish it
>> 5) Delete str1 from the index and make the index available for a new
>> comparison
>>
>> Any comments & suggestions on making the process optimal
>>
>> Siddharth
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18020806.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18022691.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to