On Fri, Jan 17, 2014 at 7:11 AM, Benedikt Ritter <brit...@apache.org> wrote:

> 2014/1/15 Oliver Heger <oliver.he...@oliver-heger.de>
>
> >
> >
> > Am 15.01.2014 15:05, schrieb Benedikt Ritter:
> > > 2014/1/15 Gary Gregory <garydgreg...@gmail.com>
> > >
> > >>  On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <brit...@apache.org
> >
> > >> wrote:
> > >>
> > >>> Hi Gary,
> > >>>
> > >>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com>
> > >>>
> > >>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <
> brit...@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1]
> is
> > >>>> about
> > >>>>> introducing a new string algorithm called Jaro Winkler Distance
> [2].
> > >>>> Since
> > >>>>> StringUtils already does a lot of things, I'm wondering if it may
> > >> make
> > >>>>> sense to introduce a new class that serves as a host for more
> string
> > >>>>> algorithms to come. It would look something like:
> > >>>>>
> > >>>>> StringAlgorithms.levenshteinDistance(str1, str2);
> > >>>>> StringAlgorithms.jaroWinklerDistance(str1, str2);
> > >>>>>
> > >>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate
> to
> > >>> the
> > >>>>> new class. It could be removed from StringUtils in the next major
> > >>>> release.
> > >>>>>
> > >>>>
> > >>>>> Thoughts?
> > >>>>>
> > >>>>
> > >>>> Yuck!
> > >>>>
> > >>>> I'd rather have once class per algo which reminds me that [codec]
> > might
> > >>> be
> > >>>> a better place for things like this that 'encode' strings into
> > >> something
> > >>>> else.
> > >>>>
> > >>>
> > >>> Both methods return a double value modeling some kind of score. They
> do
> > >> not
> > >>> encode. Maybe StringAlgorithms is the wrong name? How About
> StringScore
> > >> or
> > >>> something like that?
> > >>>
> > >>
> > >> Still wrong IMO and not OO. A single class will become another
> > >> dumping-ground/kitchen-sink like StringUtils. I would not want to see
> > one
> > >> algo be a one method one liner impl and another algo be a complex 20
> > method
> > >> job. I guess we could organize algos using nested classes like
> > >> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg
> is
> > >> another way to go.
> > >>
> > >
> > > We already have o.a.c.lang3.text, maybe this would fit?
> > >
> > > What I want to avoid is something like:
> > >
> > > LevenshteinDistance algo = new LevenshteinDistance()
> > > double dist = algo.getDistance(str1, str2);
> > >
> > > If those algorithms don't have a state, it doesn't make sense to force
> > > creation of an object. I like to idea of internal classes.
> >
> > IIUC, both algorithms do the same thing - calculating the difference (or
> > similarity) of two strings - using different methods.
> >
> > So another option would be to extract a common interface
> > (StringDifferenceMetric?) and provide the algorithms as concrete
> > implementations.
> >
>
> This is a possible, but very specific (= tied to distance measuring)
> approach. I think it is a good idea to create very specific utilities
> instead of generic ones like StringUtils, that can do a variety of things.
>
>
> >
> > A concrete use case could be a query engine which allows customizing its
> > string matching algorithm.
> >
>
> Is this really a use case? It sounds very constructed to me. Have you ever
> thought "I'd like to query on google, but I'd like suggestions to be
> matched using Levenshtein Distance algorithm"?
>

All of this is starting to feel like drifting away from [lang] but toward
what I am not sure, maybe [codec].

Gary



>
>
> >
> > If you want to avoid instantiating algorithm classes with no state, we
> > could have an enum with constants representing the available algorithms.
> >
>
> I still favor specific methods over an additional parameter.
>
>
> >
> > Oliver
> >
> > >
> > >
> > >>
> > >> Gary
> > >>
> > >>
> > >>>
> > >>>
> > >>>>
> > >>>> Gary
> > >>>>
> > >>>>
> > >>>>> Benedikt
> > >>>>>
> > >>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944
> > >>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
> > >>>>>
> > >>>>> --
> > >>>>> http://people.apache.org/~britter/
> > >>>>> http://www.systemoutprintln.de/
> > >>>>> http://twitter.com/BenediktRitter
> > >>>>> http://github.com/britter
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
> > >>>> Java Persistence with Hibernate, Second Edition<
> > >>>> http://www.manning.com/bauer3/>
> > >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > >>>> Spring Batch in Action <http://www.manning.com/templier/>
> > >>>> Blog: http://garygregory.wordpress.com
> > >>>> Home: http://garygregory.com/
> > >>>> Tweet! http://twitter.com/GaryGregory
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> http://people.apache.org/~britter/
> > >>> http://www.systemoutprintln.de/
> > >>> http://twitter.com/BenediktRitter
> > >>> http://github.com/britter
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
> > >> Java Persistence with Hibernate, Second Edition<
> > >> http://www.manning.com/bauer3/>
> > >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > >> Spring Batch in Action <http://www.manning.com/templier/>
> > >> Blog: http://garygregory.wordpress.com
> > >> Home: http://garygregory.com/
> > >> Tweet! http://twitter.com/GaryGregory
> > >>
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Reply via email to