Hi Rob, First of all, kudos for the great work moving things from [lang] into [text].
I got a copy of the Lothaire book last weekend, but haven't had a chance to read it yet. There was also some discussion around the name-parser, and since we couldn't reach a consensus, I think we could either try to have another discussion thread, or stash it somewhere so that it doesn't block a release. I also would like to implement more edit distance and string similarities, as well as look into the duration unit parser, probably adapting code from github.com/jchampemont/gunip But I'd vote for (4). First moving the human name parser elsewhere, reviewing the edit distances, and checking if there's anything else we could put into this initial release from [lang]. Once it has been released, we will be able to add things from Lothaire book, more edit distances, maybe bring back the name parser, as well as any enhancement bug fixing. Bruno >________________________________ > From: Rob Tompkins <chtom...@gmail.com> >To: Commons Developers List <dev@commons.apache.org> >Sent: Tuesday, 29 November 2016 11:45 AM >Subject: [text] Next steps. > > >Hello, > >I'm a tad curious what folks (along with Gary, Benedikt, and Bruno) think >the next steps are for text in the hopeful thought that we are eventually >heading towards a 1.0 release. Some thoughts that come to mind are: > >(1) Go over lang with fine tooth comb and see what we think should move, >(2) Go through the Lothaire "Applied Combinatorics on Words" book ( >http://lipn.univ-paris13.fr/~duchamp/Books&more/Lothaire/(Encyclopedia_of_Mathematics_and_its_Applications_)M._Lothaire-Applied_Combinatorics_On_Words-Cambridge_University_Press(2005).pdf) >and minimally implement some of the standard algorithms. >(3) Implement, from the Lothaire book, some of the more complex stuff: >heavier pattern matching, and/or natural language processing, >and/or >(4) Go straight for a release. > >I'm less for (4) because I think there's probably some smaller bits of code >in lang that probably come over. I like the idea of (2) before heading out >the door. Regarding (3), I would have to do considerable reading to make >considerable headway here, which I'm not opposed to doing it would just >merely prolong getting to a 1.0 release if we predicated the release upon >my getting that done. > >So, what do you guys think? > >Cheers, >-Rob > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org