You should look into SecondString perhaps then, like Grant said.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Sangrish <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, June 20, 2008 1:45:52 PM
> Subject: Re: Arbitrary String to String Similarity Score
> 
> 
> 
> Yes, "MoreLikeThis" is more like what I want.  
> 
> But theres one problem. Even here one has to run the query against an
> indexed set of documents.
> 
> While I would like to create two Queries through "MoreLikeThis" and get a
> score of how similar they are to each other.
> 
> Siddharth
> 
> 
> 
> 
> 
> 
> 
> Otis Gospodnetic wrote:
> > 
> > Hi,
> > 
> > Have a look at MoreLikeThis:
> > 
> > [EMAIL PROTECTED] trunk]$ ff \*MoreLikeThis\*.java
> > 
> ./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThisQuery.java
> > ./contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java
> > 
> > 
> > I think that or something a lot like it is what you are after.
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > ----- Original Message ----
> >> From: Sangrish 
> >> To: java-user@lucene.apache.org
> >> Sent: Friday, June 20, 2008 12:20:02 AM
> >> Subject: Re: Arbitrary String to String Similarity Score
> >> 
> >> 
> >> Given 2 text documents I want to quantitatively find, how similar they
> >> are,
> >> with respect to each other. Say, I want to find Cosine Similarity score
> >> between any two given documents. I am trying to use Lucene for it (is it
> >> good for this purpose?)
> >> 
> >> This use case is different from querying against a set of documents
> >> 
> >> I am not sure if Lucene provides a direct API to evaluate this score.
> >> 
> >> Siddharth
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Grant Ingersoll-6 wrote:
> >> > 
> >> > You might also have a look at the MemoryIndex.  Question, though, is  
> >> > what are you hoping to gain from doing a Query against a single  
> >> > String?  Are you doing a FuzzyQuery?  You might look at the  
> >> > SecondString project on SourceForge for doing string comparisons.
> >> > 
> >> > I guess I am a bit confused by your problem statement.  Perhaps you  
> >> > can explain more what you are trying to do at a higher level, as it  
> >> > sounds like to me you have str1 and str2, so why do you need to inject  
> >> > an index into the middle of it?
> >> > 
> >> > -Grant
> >> > 
> >> > On Jun 19, 2008, at 8:33 PM, Sangrish wrote:
> >> > 
> >> >>
> >> >> I have a use case for comparing two given strings (attached to a  
> >> >> specific
> >> >> field)
> >> >> using Lucene and get the similarity scores.
> >> >>
> >> >>  I tried but could not find any built-in way to do so. Hence  
> >> >> assuming that
> >> >> Lucene only compares a Query against Indexed documents, I came up  
> >> >> with the
> >> >> following approach:
> >> >> (Let the 2 strings be, str1 and str2 )
> >> >>
> >> >> 1) Create an IndexWriter using a RAMDirectory (I don't want to store  
> >> >> those
> >> >> strings on the disk)
> >> >> 2) Index str1 and store it
> >> >> 3) Search str2 in the index. ( shall the indexWriter be closed  
> >> >> before you
> >> >> search on the index? )
> >> >> 4) Get the similarity score & publish it
> >> >> 5) Delete str1 from the index and make the index available for a new
> >> >> comparison
> >> >>
> >> >> Any comments & suggestions on making the process optimal
> >> >>
> >> >> Siddharth
> >> >>
> >> >> -- 
> >> >> View this message in context:
> >> >> 
> >> 
> http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18020806.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >> >>
> >> > 
> >> > --------------------------
> >> > Grant Ingersoll
> >> > http://www.lucidimagination.com
> >> > 
> >> > Lucene Helpful Hints:
> >> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> >> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> > For additional commands, e-mail: [EMAIL PROTECTED]
> >> > 
> >> > 
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> 
> http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18022691.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> 
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Arbitrary-String-to-String-Similarity-Score-tp18020806p18034468.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to