Am 17.01.2014 13:11, schrieb Benedikt Ritter: > 2014/1/15 Oliver Heger <oliver.he...@oliver-heger.de> > >> >> >> Am 15.01.2014 15:05, schrieb Benedikt Ritter: >>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com> >>> >>>> On Wed, Jan 15, 2014 at 8:06 AM, Benedikt Ritter <brit...@apache.org> >>>> wrote: >>>> >>>>> Hi Gary, >>>>> >>>>> 2014/1/15 Gary Gregory <garydgreg...@gmail.com> >>>>> >>>>>> On Wed, Jan 15, 2014 at 7:00 AM, Benedikt Ritter <brit...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> we currently have StringUtils.getLevenshteinDistance. LANG-944 [1] is >>>>>> about >>>>>>> introducing a new string algorithm called Jaro Winkler Distance [2]. >>>>>> Since >>>>>>> StringUtils already does a lot of things, I'm wondering if it may >>>> make >>>>>>> sense to introduce a new class that serves as a host for more string >>>>>>> algorithms to come. It would look something like: >>>>>>> >>>>>>> StringAlgorithms.levenshteinDistance(str1, str2); >>>>>>> StringAlgorithms.jaroWinklerDistance(str1, str2); >>>>>>> >>>>>>> We would deprecate StringUtils.getLevenshteinDistance and delegate to >>>>> the >>>>>>> new class. It could be removed from StringUtils in the next major >>>>>> release. >>>>>>> >>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>> >>>>>> Yuck! >>>>>> >>>>>> I'd rather have once class per algo which reminds me that [codec] >> might >>>>> be >>>>>> a better place for things like this that 'encode' strings into >>>> something >>>>>> else. >>>>>> >>>>> >>>>> Both methods return a double value modeling some kind of score. They do >>>> not >>>>> encode. Maybe StringAlgorithms is the wrong name? How About StringScore >>>> or >>>>> something like that? >>>>> >>>> >>>> Still wrong IMO and not OO. A single class will become another >>>> dumping-ground/kitchen-sink like StringUtils. I would not want to see >> one >>>> algo be a one method one liner impl and another algo be a complex 20 >> method >>>> job. I guess we could organize algos using nested classes like >>>> StringFoo.BarAlgo but that's not ideal. All algo classes in a new pkg is >>>> another way to go. >>>> >>> >>> We already have o.a.c.lang3.text, maybe this would fit? >>> >>> What I want to avoid is something like: >>> >>> LevenshteinDistance algo = new LevenshteinDistance() >>> double dist = algo.getDistance(str1, str2); >>> >>> If those algorithms don't have a state, it doesn't make sense to force >>> creation of an object. I like to idea of internal classes. >> >> IIUC, both algorithms do the same thing - calculating the difference (or >> similarity) of two strings - using different methods. >> >> So another option would be to extract a common interface >> (StringDifferenceMetric?) and provide the algorithms as concrete >> implementations. >> > > This is a possible, but very specific (= tied to distance measuring) > approach. I think it is a good idea to create very specific utilities > instead of generic ones like StringUtils, that can do a variety of things. > > >> >> A concrete use case could be a query engine which allows customizing its >> string matching algorithm. >> > > Is this really a use case? It sounds very constructed to me. Have you ever > thought "I'd like to query on google, but I'd like suggestions to be > matched using Levenshtein Distance algorithm"?
The configuration is not done by a user, but the search engine may decide internally which matching algorithm to use based on different use cases or maybe the document type. > > >> >> If you want to avoid instantiating algorithm classes with no state, we >> could have an enum with constants representing the available algorithms. >> > > I still favor specific methods over an additional parameter. This is a misunderstanding. I meant an enum class holding instances of algorithm implementations which can be shared, e.g.: enum StringDifferences implement StringDifference { LEVENSTEIN { ... }, FOO { ... }, ... } But as Gary said else-thread, I also doubt whether all these specific algorithms are in scope for [lang]. This is not what you typically need in your daily programming work. Oliver > > >> >> Oliver >> >>> >>> >>>> >>>> Gary >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> Gary >>>>>> >>>>>> >>>>>>> Benedikt >>>>>>> >>>>>>> [1] https://issues.apache.org/jira/i#browse/LANG-944 >>>>>>> [2] http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance >>>>>>> >>>>>>> -- >>>>>>> http://people.apache.org/~britter/ >>>>>>> http://www.systemoutprintln.de/ >>>>>>> http://twitter.com/BenediktRitter >>>>>>> http://github.com/britter >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org >>>>>> Java Persistence with Hibernate, Second Edition< >>>>>> http://www.manning.com/bauer3/> >>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> >>>>>> Spring Batch in Action <http://www.manning.com/templier/> >>>>>> Blog: http://garygregory.wordpress.com >>>>>> Home: http://garygregory.com/ >>>>>> Tweet! http://twitter.com/GaryGregory >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> http://people.apache.org/~britter/ >>>>> http://www.systemoutprintln.de/ >>>>> http://twitter.com/BenediktRitter >>>>> http://github.com/britter >>>>> >>>> >>>> >>>> >>>> -- >>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org >>>> Java Persistence with Hibernate, Second Edition< >>>> http://www.manning.com/bauer3/> >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> >>>> Spring Batch in Action <http://www.manning.com/templier/> >>>> Blog: http://garygregory.wordpress.com >>>> Home: http://garygregory.com/ >>>> Tweet! http://twitter.com/GaryGregory >>>> >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org