Hello Phil, 2015-04-27 4:56 GMT+02:00 Phil Steitz <phil.ste...@gmail.com>:
> On 4/26/15 3:14 AM, Benedikt Ritter wrote: > > 2015-04-26 11:52 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > > > >> > >> 2015-04-19 17:29 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > >> > >>> > >>> 2015-04-19 17:24 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > >>> > >>>> Hello Bruno, > >>>> > >>>> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita < > >>>> brunodepau...@yahoo.com.br>: > >>>> > >>>>> Thanks for the thorough code review Benedikt! Comments inline. > >>>>> > >>>>>> 1. IP clearance! There are several comments talking about code > >>>>> adapted from > >>>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an example > >>>>> see > >>>>>> the NOTICE.txt in [lang]. > >>>>> Filed https://issues.apache.org/jira/browse/SANDBOX-497 > >>>>> And first try to fix it: > >>>>> > >>>>> > >>>>> > https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497 > >>>>> Does that look correct? I also checked with the author before porting > >>>>> it to the initial Java version, and told him about the [text] > component - > >>>>> https://github.com/jasonpriem/HumanNameParser.php/issues/11 > >>>> > >>>> I've adjusted the changes a bit and merged it into master. > >>>> > >>>> > >>>>>> 2. The names package needs work. > >>>>> Completely agree. We can try to make it thread safe too. This will > take > >>>>> a little longer. > >>>>> File new issue for tracking it > >>>>> https://issues.apache.org/jira/browse/SANDBOX-498 > >>>> > >>>> I've pushed a new branch containing a proposal how to fix this. Please > >>>> review. > >>>> > >>>> > >>>>> > >>>>>> 3. There are a some useless JavaDocs, which can be dropped. For > >>>>> example: > >>>>>> (...)> Better rename the variable to middleName and drop the > comment. > >>>>> What about renaming the variables but leaving the comments? > >>>>> > >>>> For I've corrected this for the names package in the SANDBOX-498 > branch. > >>>> > >>>> > >>>>>> 4. Several classes have no tests. The overall test coverage looks > >>>>> good, so > >>>>>> this may not be a problem. > >>>>> Before the initial release I can start a development cycle just to > add > >>>>> new tests for parts that seem important - like code with somewhat > high [+5] > >>>>> cyclomatic complexity and uncovered branches. Just to make sure that > the > >>>>> initial release will have an acceptable quality and less hidden bugs > :-) > >>>>> > >>>> Nice! > >>>> > >>>> > >>>>>> 5. I don't think the shade configuration is correct. To me it looks > >>>>> like > >>>>>> commons-text depends on [lang] and also shades some classes. Do we > >>>>> really > >>>>>> need the explicit dependency? Furthermore, why do we need to shade > so > >>>>> many > >>>>>> classes? Why do we need anything beside StringUtils? > >>>>> My mistake. I copied the [functor] configuration without much > thinking. > >>>>> IIRC, this dependency was added because of the names package. But it > >>>>> shouldn't be too complicate to replace it. It is mainly used for > checking > >>>>> empty/blank values. > >>>>> > >>>>> So let's drop the [lang] dependency and use something like `$VAR != > >>>>> null && ! "".equals($VAR)` instead? > >>>>> > >>>> I think we also need StringUtils in the similarity package. We should > >>>> handle this after we have merged the changes made to the names > package. > >>>> > >>>> > >>>>> Thanks again for reviewing the code! > >>>>> > >>>> No problem, thanks for pushing this forward. > >>>> > >>>> Here are some more things we need to take care of before 1.0: > >>>> - JIRA project > >>>> > >>> Requested a JIRA project: > >>> > https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477 > >>> > >> Here is the JIRA project [1]. I'll find out, how we can move the Commons > >> Text issues from the Sandbox project to the new project. > >> > > I've migrated all issues from the Sandbox project to the new Commons Text > > project. I've also updated the project documentation. > > Did I miss the promotion VOTE? > No, but I think the component should be in a state to be released before starting a promotion vote. We have examples of components which have been promoted to early and I don't want this to happen with [text]. Benedikt > > Phil > > > > > >> Benedikt > >> > >> [1] https://issues.apache.org/jira/browse/TEXT > >> > >> > >>> > >>>> - Rename similarity package to distance package? We have five Distance > >>>> implementations but only one Similarity. Furthermore we have > FuzzyScore. Is > >>>> it a distance or a similarity? It should be renamed accordingly. > >>>> - ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>. > >>>> Does it make sense to introduce a new class called Vector? > >>>> > >>>> Benedikt > >>>> > >>>> > >>>>> All the best,Bruno > >>>>> > >>>>> > >>>>> From: Benedikt Ritter <brit...@apache.org> > >>>>> To: Commons Developers List <dev@commons.apache.org> > >>>>> Sent: Sunday, April 19, 2015 9:01 PM > >>>>> Subject: [TEXT] Review of the code base > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I've looked through the code base of [text] and I already did some > clean > >>>>> up. However there are a few things, that need more work IMHO: > >>>>> > >>>>> 1. IP clearance! There are several comments talking about code > adapted > >>>>> from > >>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an example > see > >>>>> the NOTICE.txt in [lang]. > >>>>> > >>>>> 2. The names package needs work. Currently the HumanNameParser takes > a > >>>>> Name > >>>>> object (which is just a wrapper around a String) as parameter. The > parse > >>>>> result is then saved in fields (e.g. firstName). This makes the > parser > >>>>> unusable after it has parsed a name. I think the API should be > reworked > >>>>> such as: > >>>>> - The constructor of the parser takes configuration options which > can be > >>>>> reused for several names to parse > >>>>> - the parse method takes a string as parameter, containing a name > >>>>> - the parse method returns an immutable Name objects which has > getters > >>>>> for > >>>>> firstName, lastName etc. > >>>>> > >>>>> 3. There are a some useless JavaDocs, which can be dropped. For > example: > >>>>> /** > >>>>> * Middle name. > >>>>> */ > >>>>> private String middle; > >>>>> > >>>>> Better rename the variable to middleName and drop the comment. > >>>>> > >>>>> 4. Several classes have no tests. The overall test coverage looks > good, > >>>>> so > >>>>> this may not be a problem. > >>>>> > >>>>> 5. I don't think the shade configuration is correct. To me it looks > like > >>>>> commons-text depends on [lang] and also shades some classes. Do we > >>>>> really > >>>>> need the explicit dependency? Furthermore, why do we need to shade so > >>>>> many > >>>>> classes? Why do we need anything beside StringUtils? > >>>>> > >>>>> Nevertheless, this library looks very good. Kudos to Bruno for > pushing > >>>>> this > >>>>> forward! > >>>>> > >>>>> Regards, > >>>>> Benedikt > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> http://people.apache.org/~britter/ > >>>>> http://www.systemoutprintln.de/ > >>>>> http://twitter.com/BenediktRitter > >>>>> http://github.com/britter > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> http://people.apache.org/~britter/ > >>>> http://www.systemoutprintln.de/ > >>>> http://twitter.com/BenediktRitter > >>>> http://github.com/britter > >>>> > >>> > >>> > >>> -- > >>> http://people.apache.org/~britter/ > >>> http://www.systemoutprintln.de/ > >>> http://twitter.com/BenediktRitter > >>> http://github.com/britter > >>> > >> > >> > >> -- > >> http://people.apache.org/~britter/ > >> http://www.systemoutprintln.de/ > >> http://twitter.com/BenediktRitter > >> http://github.com/britter > >> > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter