INFRA issue is https://issues.apache.org/jira/browse/INFRA-8595
2014-11-07 16:17 GMT+01:00 Siegfried Goeschl <siegfried.goes...@it20one.com> : > +1 > > Cheers, > > Siegfried Goeschl > > > On 07 Nov 2014, at 09:47, Benedikt Ritter <brit...@apache.org> wrote: > > > > Hi all, > > > > as disucssed, we'd like to create a new component which is focused on > > algorithms for string/text processing. > > > > We (= Bruno and I) would like to create this new component with git as > > primary vcs right away, which will make Commons Text the second Commons > > component to use git. Please let me know if you have objections against > > this. I'll open an INFRA ticket for the new git repo, this weekend. > > > > Thanks! > > Benedikt > > > > 2014-10-27 12:57 GMT+01:00 Benedikt Ritter <brit...@apache.org>: > > > >> > >> > >> 2014-10-27 12:32 GMT+01:00 Bruno P. Kinoshita < > brunodepau...@yahoo.com.br> > >> : > >> > >>> Hi Benedikt! > >>>> Just let me know if you need help with the bootstraping of the new > >>> project. > >>> Yes, please :) > >>> > >> > >> I'll give folks some more time to share their thoughts about this and > >> create the new project then. > >> > >> > >>> > >>>> Maybe we should even announce this on announce@. There my be other > >>> projects interested in a library like this (for example Apache Tika > [1]) > >>> Good idea! Should we drop a note there once the project has been > created > >>> or after we already have some code in there? > >>> > >> > >> The latter seems appropriate to me. > >> > >> > >>> > >>> Thanks!Bruno > >>> > >>> > >>> From: Benedikt Ritter <brit...@apache.org> > >>> To: Commons Developers List <dev@commons.apache.org>; Bruno P. > >>> Kinoshita <brunodepau...@yahoo.com.br> > >>> Sent: Monday, October 27, 2014 5:45 AM > >>> Subject: Re: [sandbox] New sandbox component > >>> > >>> No objections from my site. I think this is a good idea. Just let me > know > >>> if you need help with the bootstraping of the new project. Maybe we > should > >>> even announce this on announce@. There my be other projects interested > >>> in a library like this (for example Apache Tika [1]) > >>> > >>> Benedikt > >>> > >>> [1] http://tika.apache.org/ > >>> > >>> > >>> > >>> 2014-10-27 0:41 GMT+01:00 Bruno P. Kinoshita < > brunodepau...@yahoo.com.br > >>>> : > >>> > >>> Hello all, > >>> At the moment I'm working with data matching and record linkage, and > had > >>> to port some existing string comparison algorithms found in several > open > >>> source projects (fuzzy-search-tools, simmetrics, lingpipe, [lang], > [codec]). > >>> At that time I noticed LANG-591 [1], which suggests a more complex > >>> levenshtein distance algorithm. There are several other algorithms too > >>> (damerau-levenshtein, jaro, jaro-wrinkler, jaccard, bitap, q-gram, > soundex, > >>> metaphone). Instead of trying to put them all in, say, [lang], I'd > like to > >>> experiment with a new [text] component in the sandbox, if there are no > >>> objections. > >>> I will take a look at the existing code and its license, but most of > >>> these algorithms have good Wiki pages with pseudo code available; as > well > >>> as academic papers. > >>> Maybe this component could be useful for other projects like [lang], > >>> Lucene, larsga/Duke, and Talend Open Studio. And even though my > initial use > >>> case for this would be string comparison, I think it could support > other > >>> use cases too. > >>> Thoughts on this? Anyone else interested on such a component? > >>> Thanks!Bruno > >>> [1] https://issues.apache.org/jira/browse/LANG-591 > >>> > >>> > >>> > >>> -- > >>> > >>> > http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter > >>> > >>> -- > >>> > >>> < > http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter > > > >>> > >>> < > http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter > > > >>> http://people.apache.org/~britter/ > >>> http://www.systemoutprintln.de/ > >>> http://twitter.com/BenediktRitter > >>> http://github.com/britter > >>> > >> > > > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter