Re: Similarity percentage between two Strings

2008-09-09 Thread Thiago Moreira
between two Strings Googling for "java string similarity" throws up some stuff you might find useful. -- Ian. On Wed, Sep 3, 2008 at 11:58 PM, Thiago Moreira <[EMAIL PROTECTED]> wrote: Well, the similar definition that I'm looking for is the number 2, maybe the numbe

Re: Similarity percentage between two Strings

2008-09-04 Thread Karl Wettin
I would create 1-5 ngram sized shingles and measure the distance using Tanimoto coefficient. That would probably work out just fine. You might want to add more weight the greater the size of the shingle. There are shingle filters in lucene/java/contrib/analyzers and there is a Tanimoto dist

Re: Similarity percentage between two Strings

2008-09-04 Thread Ian Lea
Googling for "java string similarity" throws up some stuff you might find useful. -- Ian. On Wed, Sep 3, 2008 at 11:58 PM, Thiago Moreira <[EMAIL PROTECTED]> wrote: > > Well, the similar definition that I'm looking for is the number 2, maybe > the number 3, but to start the number 2 is enou

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
More details may change my opinion (not quite sure how others feel yet), but with the way you've described it so far, it seems like all you need is a basic string matcher: For every message: - if message.subject is found in the pool, then this message is "similar to" the message in the poo

Re: Similarity percentage between two Strings

2008-09-03 Thread Thiago Moreira
- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
I don't know how much of this is a Lucene problem, but -- as I'm sure you will inevitably hear from others on the list -- it depends on what your definition of "similar" is. By similar, do you mean: 1. Identical, except for variations in case (upper/lower) 2. Allow 1., but also allow prefix

Similarity percentage between two Strings

2008-09-03 Thread Thiago Moreira
- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]