Am 17.01.2014 13:11, schrieb Benedikt Ritter:
> 2014/1/15 Oliver Heger <oliver.he...@oliver-heger.de>
> 
>>
>>
>> Am 15.01.2014 15:05, schrieb Benedikt Ritter:
>>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com>
>>>
>>>>  On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <brit...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Gary,
>>>>>
>>>>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com>
>>>>>
>>>>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <brit...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1] is
>>>>>> about
>>>>>>> introducing a new string algorithm called Jaro Winkler Distance [2].
>>>>>> Since
>>>>>>> StringUtils already does a lot of things, I'm wondering if it may
>>>> make
>>>>>>> sense to introduce a new class that serves as a host for more string
>>>>>>> algorithms to come. It would look something like:
>>>>>>>
>>>>>>> StringAlgorithms.levenshteinDistance(str1, str2);
>>>>>>> StringAlgorithms.jaroWinklerDistance(str1, str2);
>>>>>>>
>>>>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate to
>>>>> the
>>>>>>> new class. It could be removed from StringUtils in the next major
>>>>>> release.
>>>>>>>
>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>
>>>>>> Yuck!
>>>>>>
>>>>>> I'd rather have once class per algo which reminds me that [codec]
>> might
>>>>> be
>>>>>> a better place for things like this that 'encode' strings into
>>>> something
>>>>>> else.
>>>>>>
>>>>>
>>>>> Both methods return a double value modeling some kind of score. They do
>>>> not
>>>>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore
>>>> or
>>>>> something like that?
>>>>>
>>>>
>>>> Still wrong IMO and not OO. A single class will become another
>>>> dumping-ground/kitchen-sink like StringUtils. I would not want to see
>> one
>>>> algo be a one method one liner impl and another algo be a complex 20
>> method
>>>> job. I guess we could organize algos using nested classes like
>>>> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is
>>>> another way to go.
>>>>
>>>
>>> We already have o.a.c.lang3.text, maybe this would fit?
>>>
>>> What I want to avoid is something like:
>>>
>>> LevenshteinDistance algo = new LevenshteinDistance()
>>> double dist = algo.getDistance(str1, str2);
>>>
>>> If those algorithms don't have a state, it doesn't make sense to force
>>> creation of an object. I like to idea of internal classes.
>>
>> IIUC, both algorithms do the same thing - calculating the difference (or
>> similarity) of two strings - using different methods.
>>
>> So another option would be to extract a common interface
>> (StringDifferenceMetric?) and provide the algorithms as concrete
>> implementations.
>>
> 
> This is a possible, but very specific (= tied to distance measuring)
> approach. I think it is a good idea to create very specific utilities
> instead of generic ones like StringUtils, that can do a variety of things.
> 
> 
>>
>> A concrete use case could be a query engine which allows customizing its
>> string matching algorithm.
>>
> 
> Is this really a use case? It sounds very constructed to me. Have you ever
> thought "I'd like to query on google, but I'd like suggestions to be
> matched using Levenshtein Distance algorithm"?

The configuration is not done by a user, but the search engine may
decide internally which matching algorithm to use based on different use
cases or maybe the document type.

> 
> 
>>
>> If you want to avoid instantiating algorithm classes with no state, we
>> could have an enum with constants representing the available algorithms.
>>
> 
> I still favor specific methods over an additional parameter.

This is a misunderstanding. I meant an enum class holding instances of
algorithm implementations which can be shared, e.g.:

enum StringDifferences implement StringDifference {
    LEVENSTEIN {
      ...
    },

    FOO {
      ...
    },
    ...
}

But as Gary said else-thread, I also doubt whether all these specific
algorithms are in scope for [lang]. This is not what you typically need
in your daily programming work.

Oliver

> 
> 
>>
>> Oliver
>>
>>>
>>>
>>>>
>>>> Gary
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Gary
>>>>>>
>>>>>>
>>>>>>> Benedikt
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944
>>>>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
>>>>>>>
>>>>>>> --
>>>>>>> http://people.apache.org/~britter/
>>>>>>> http://www.systemoutprintln.de/
>>>>>>> http://twitter.com/BenediktRitter
>>>>>>> http://github.com/britter
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
>>>>>> Java Persistence with Hibernate, Second Edition<
>>>>>> http://www.manning.com/bauer3/>
>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>> Blog: http://garygregory.wordpress.com
>>>>>> Home: http://garygregory.com/
>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://people.apache.org/~britter/
>>>>> http://www.systemoutprintln.de/
>>>>> http://twitter.com/BenediktRitter
>>>>> http://github.com/britter
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
>>>> Java Persistence with Hibernate, Second Edition<
>>>> http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to