On Fri, Jan 17, 2014 at 7:11 AM, Benedikt Ritter <brit...@apache.org> wrote:
> 2014/1/15 Oliver Heger <oliver.he...@oliver-heger.de> > > > > > > > Am 15.01.2014 15:05, schrieb Benedikt Ritter: > > > 2014/1/15 Gary Gregory <garydgreg...@gmail.com> > > > > > >> On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <brit...@apache.org > > > > >> wrote: > > >> > > >>> Hi Gary, > > >>> > > >>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com> > > >>> > > >>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter < > brit...@apache.org> > > >>>> wrote: > > >>>> > > >>>>> Hi all, > > >>>>> > > >>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1] > is > > >>>> about > > >>>>> introducing a new string algorithm called Jaro Winkler Distance > [2]. > > >>>> Since > > >>>>> StringUtils already does a lot of things, I'm wondering if it may > > >> make > > >>>>> sense to introduce a new class that serves as a host for more > string > > >>>>> algorithms to come. It would look something like: > > >>>>> > > >>>>> StringAlgorithms.levenshteinDistance(str1, str2); > > >>>>> StringAlgorithms.jaroWinklerDistance(str1, str2); > > >>>>> > > >>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate > to > > >>> the > > >>>>> new class. It could be removed from StringUtils in the next major > > >>>> release. > > >>>>> > > >>>> > > >>>>> Thoughts? > > >>>>> > > >>>> > > >>>> Yuck! > > >>>> > > >>>> I'd rather have once class per algo which reminds me that [codec] > > might > > >>> be > > >>>> a better place for things like this that 'encode' strings into > > >> something > > >>>> else. > > >>>> > > >>> > > >>> Both methods return a double value modeling some kind of score. They > do > > >> not > > >>> encode. Maybe StringAlgorithms is the wrong name? How About > StringScore > > >> or > > >>> something like that? > > >>> > > >> > > >> Still wrong IMO and not OO. A single class will become another > > >> dumping-ground/kitchen-sink like StringUtils. I would not want to see > > one > > >> algo be a one method one liner impl and another algo be a complex 20 > > method > > >> job. I guess we could organize algos using nested classes like > > >> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg > is > > >> another way to go. > > >> > > > > > > We already have o.a.c.lang3.text, maybe this would fit? > > > > > > What I want to avoid is something like: > > > > > > LevenshteinDistance algo = new LevenshteinDistance() > > > double dist = algo.getDistance(str1, str2); > > > > > > If those algorithms don't have a state, it doesn't make sense to force > > > creation of an object. I like to idea of internal classes. > > > > IIUC, both algorithms do the same thing - calculating the difference (or > > similarity) of two strings - using different methods. > > > > So another option would be to extract a common interface > > (StringDifferenceMetric?) and provide the algorithms as concrete > > implementations. > > > > This is a possible, but very specific (= tied to distance measuring) > approach. I think it is a good idea to create very specific utilities > instead of generic ones like StringUtils, that can do a variety of things. > > > > > > A concrete use case could be a query engine which allows customizing its > > string matching algorithm. > > > > Is this really a use case? It sounds very constructed to me. Have you ever > thought "I'd like to query on google, but I'd like suggestions to be > matched using Levenshtein Distance algorithm"? > All of this is starting to feel like drifting away from [lang] but toward what I am not sure, maybe [codec]. Gary > > > > > > If you want to avoid instantiating algorithm classes with no state, we > > could have an enum with constants representing the available algorithms. > > > > I still favor specific methods over an additional parameter. > > > > > > Oliver > > > > > > > > > > >> > > >> Gary > > >> > > >> > > >>> > > >>> > > >>>> > > >>>> Gary > > >>>> > > >>>> > > >>>>> Benedikt > > >>>>> > > >>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944 > > >>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance > > >>>>> > > >>>>> -- > > >>>>> http://people.apache.org/~britter/ > > >>>>> http://www.systemoutprintln.de/ > > >>>>> http://twitter.com/BenediktRitter > > >>>>> http://github.com/britter > > >>>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > > >>>> Java Persistence with Hibernate, Second Edition< > > >>>> http://www.manning.com/bauer3/> > > >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > > >>>> Spring Batch in Action <http://www.manning.com/templier/> > > >>>> Blog: http://garygregory.wordpress.com > > >>>> Home: http://garygregory.com/ > > >>>> Tweet! http://twitter.com/GaryGregory > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> http://people.apache.org/~britter/ > > >>> http://www.systemoutprintln.de/ > > >>> http://twitter.com/BenediktRitter > > >>> http://github.com/britter > > >>> > > >> > > >> > > >> > > >> -- > > >> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > > >> Java Persistence with Hibernate, Second Edition< > > >> http://www.manning.com/bauer3/> > > >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > > >> Spring Batch in Action <http://www.manning.com/templier/> > > >> Blog: http://garygregory.wordpress.com > > >> Home: http://garygregory.com/ > > >> Tweet! http://twitter.com/GaryGregory > > >> > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory