+1 Cheers,
Siegfried Goeschl > On 07 Nov 2014, at 09:47, Benedikt Ritter <brit...@apache.org> wrote: > > Hi all, > > as disucssed, we'd like to create a new component which is focused on > algorithms for string/text processing. > > We (= Bruno and I) would like to create this new component with git as > primary vcs right away, which will make Commons Text the second Commons > component to use git. Please let me know if you have objections against > this. I'll open an INFRA ticket for the new git repo, this weekend. > > Thanks! > Benedikt > > 2014-10-27 12:57 GMT+01:00 Benedikt Ritter <brit...@apache.org>: > >> >> >> 2014-10-27 12:32 GMT+01:00 Bruno P. Kinoshita <brunodepau...@yahoo.com.br> >> : >> >>> Hi Benedikt! >>>> Just let me know if you need help with the bootstraping of the new >>> project. >>> Yes, please :) >>> >> >> I'll give folks some more time to share their thoughts about this and >> create the new project then. >> >> >>> >>>> Maybe we should even announce this on announce@. There my be other >>> projects interested in a library like this (for example Apache Tika [1]) >>> Good idea! Should we drop a note there once the project has been created >>> or after we already have some code in there? >>> >> >> The latter seems appropriate to me. >> >> >>> >>> Thanks!Bruno >>> >>> >>> From: Benedikt Ritter <brit...@apache.org> >>> To: Commons Developers List <dev@commons.apache.org>; Bruno P. >>> Kinoshita <brunodepau...@yahoo.com.br> >>> Sent: Monday, October 27, 2014 5:45 AM >>> Subject: Re: [sandbox] New sandbox component >>> >>> No objections from my site. I think this is a good idea. Just let me know >>> if you need help with the bootstraping of the new project. Maybe we should >>> even announce this on announce@. There my be other projects interested >>> in a library like this (for example Apache Tika [1]) >>> >>> Benedikt >>> >>> [1] http://tika.apache.org/ >>> >>> >>> >>> 2014-10-27 0:41 GMT+01:00 Bruno P. Kinoshita <brunodepau...@yahoo.com.br >>>> : >>> >>> Hello all, >>> At the moment I'm working with data matching and record linkage, and had >>> to port some existing string comparison algorithms found in several open >>> source projects (fuzzy-search-tools, simmetrics, lingpipe, [lang], [codec]). >>> At that time I noticed LANG-591 [1], which suggests a more complex >>> levenshtein distance algorithm. There are several other algorithms too >>> (damerau-levenshtein, jaro, jaro-wrinkler, jaccard, bitap, q-gram, soundex, >>> metaphone). Instead of trying to put them all in, say, [lang], I'd like to >>> experiment with a new [text] component in the sandbox, if there are no >>> objections. >>> I will take a look at the existing code and its license, but most of >>> these algorithms have good Wiki pages with pseudo code available; as well >>> as academic papers. >>> Maybe this component could be useful for other projects like [lang], >>> Lucene, larsga/Duke, and Talend Open Studio. And even though my initial use >>> case for this would be string comparison, I think it could support other >>> use cases too. >>> Thoughts on this? Anyone else interested on such a component? >>> Thanks!Bruno >>> [1] https://issues.apache.org/jira/browse/LANG-591 >>> >>> >>> >>> -- >>> >>> http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter >>> >>> -- >>> >>> <http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter> >>> >>> <http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter> >>> http://people.apache.org/~britter/ >>> http://www.systemoutprintln.de/ >>> http://twitter.com/BenediktRitter >>> http://github.com/britter >>> >> > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org