Hi, Have a look at MoreLikeThis:
[EMAIL PROTECTED] trunk]$ ff \*MoreLikeThis\*.java ./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThisQuery.java ./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java I think that or something a lot like it is what you are after. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Sangrish <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, June 20, 2008 12:20:02 AM > Subject: Re: Arbitrary String to String Similarity Score > > > Given 2 text documents I want to quantitatively find, how similar they are, > with respect to each other. Say, I want to find Cosine Similarity score > between any two given documents. I am trying to use Lucene for it (is it > good for this purpose?) > > This use case is different from querying against a set of documents > > I am not sure if Lucene provides a direct API to evaluate this score. > > Siddharth > > > > > > Grant Ingersoll-6 wrote: > > > > You might also have a look at the MemoryIndex. Question, though, is > > what are you hoping to gain from doing a Query against a single > > String? Are you doing a FuzzyQuery? You might look at the > > SecondString project on SourceForge for doing string comparisons. > > > > I guess I am a bit confused by your problem statement. Perhaps you > > can explain more what you are trying to do at a higher level, as it > > sounds like to me you have str1 and str2, so why do you need to inject > > an index into the middle of it? > > > > -Grant > > > > On Jun 19, 2008, at 8:33 PM, Sangrish wrote: > > > >> > >> I have a use case for comparing two given strings (attached to a > >> specific > >> field) > >> using Lucene and get the similarity scores. > >> > >> I tried but could not find any built-in way to do so. Hence > >> assuming that > >> Lucene only compares a Query against Indexed documents, I came up > >> with the > >> following approach: > >> (Let the 2 strings be, str1 and str2 ) > >> > >> 1) Create an IndexWriter using a RAMDirectory (I don't want to store > >> those > >> strings on the disk) > >> 2) Index str1 and store it > >> 3) Search str2 in the index. ( shall the indexWriter be closed > >> before you > >> search on the index? ) > >> 4) Get the similarity score & publish it > >> 5) Delete str1 from the index and make the index available for a new > >> comparison > >> > >> Any comments & suggestions on making the process optimal > >> > >> Siddharth > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18020806.html > >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > > > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com > > > > Lucene Helpful Hints: > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > -- > View this message in context: > http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18022691.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]