2015-04-26 11:52 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > > > 2015-04-19 17:29 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > >> >> >> 2015-04-19 17:24 GMT+02:00 Benedikt Ritter <brit...@apache.org>: >> >>> Hello Bruno, >>> >>> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita < >>> brunodepau...@yahoo.com.br>: >>> >>>> Thanks for the thorough code review Benedikt! Comments inline. >>>> >>>> > 1. IP clearance! There are several comments talking about code >>>> adapted from >>>> >PHP libraries. This needs to be listed in NOTICE.txt. For an example >>>> see >>>> >the NOTICE.txt in [lang]. >>>> Filed https://issues.apache.org/jira/browse/SANDBOX-497 >>>> And first try to fix it: >>>> >>>> >>>> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497 >>>> Does that look correct? I also checked with the author before porting >>>> it to the initial Java version, and told him about the [text] component - >>>> https://github.com/jasonpriem/HumanNameParser.php/issues/11 >>> >>> >>> I've adjusted the changes a bit and merged it into master. >>> >>> >>>> >>>> > 2. The names package needs work. >>>> Completely agree. We can try to make it thread safe too. This will take >>>> a little longer. >>>> File new issue for tracking it >>>> https://issues.apache.org/jira/browse/SANDBOX-498 >>> >>> >>> I've pushed a new branch containing a proposal how to fix this. Please >>> review. >>> >>> >>>> >>>> >>>> > 3. There are a some useless JavaDocs, which can be dropped. For >>>> example: >>>> > (...)> Better rename the variable to middleName and drop the comment. >>>> What about renaming the variables but leaving the comments? >>>> >>> >>> For I've corrected this for the names package in the SANDBOX-498 branch. >>> >>> >>>> >>>> > 4. Several classes have no tests. The overall test coverage looks >>>> good, so >>>> >this may not be a problem. >>>> Before the initial release I can start a development cycle just to add >>>> new tests for parts that seem important - like code with somewhat high [+5] >>>> cyclomatic complexity and uncovered branches. Just to make sure that the >>>> initial release will have an acceptable quality and less hidden bugs :-) >>>> >>> >>> Nice! >>> >>> >>>> > 5. I don't think the shade configuration is correct. To me it looks >>>> like >>>> >commons-text depends on [lang] and also shades some classes. Do we >>>> really >>>> >need the explicit dependency? Furthermore, why do we need to shade so >>>> many >>>> >classes? Why do we need anything beside StringUtils? >>>> My mistake. I copied the [functor] configuration without much thinking. >>>> IIRC, this dependency was added because of the names package. But it >>>> shouldn't be too complicate to replace it. It is mainly used for checking >>>> empty/blank values. >>>> >>>> So let's drop the [lang] dependency and use something like `$VAR != >>>> null && ! "".equals($VAR)` instead? >>>> >>> >>> I think we also need StringUtils in the similarity package. We should >>> handle this after we have merged the changes made to the names package. >>> >>> >>>> Thanks again for reviewing the code! >>>> >>> >>> No problem, thanks for pushing this forward. >>> >>> Here are some more things we need to take care of before 1.0: >>> - JIRA project >>> >> >> Requested a JIRA project: >> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477 >> > > Here is the JIRA project [1]. I'll find out, how we can move the Commons > Text issues from the Sandbox project to the new project. >
I've migrated all issues from the Sandbox project to the new Commons Text project. I've also updated the project documentation. > > Benedikt > > [1] https://issues.apache.org/jira/browse/TEXT > > >> >> >>> - Rename similarity package to distance package? We have five Distance >>> implementations but only one Similarity. Furthermore we have FuzzyScore. Is >>> it a distance or a similarity? It should be renamed accordingly. >>> - ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>. >>> Does it make sense to introduce a new class called Vector? >>> >>> Benedikt >>> >>> >>>> >>>> All the best,Bruno >>>> >>>> >>>> From: Benedikt Ritter <brit...@apache.org> >>>> To: Commons Developers List <dev@commons.apache.org> >>>> Sent: Sunday, April 19, 2015 9:01 PM >>>> Subject: [TEXT] Review of the code base >>>> >>>> Hi all, >>>> >>>> I've looked through the code base of [text] and I already did some clean >>>> up. However there are a few things, that need more work IMHO: >>>> >>>> 1. IP clearance! There are several comments talking about code adapted >>>> from >>>> PHP libraries. This needs to be listed in NOTICE.txt. For an example see >>>> the NOTICE.txt in [lang]. >>>> >>>> 2. The names package needs work. Currently the HumanNameParser takes a >>>> Name >>>> object (which is just a wrapper around a String) as parameter. The parse >>>> result is then saved in fields (e.g. firstName). This makes the parser >>>> unusable after it has parsed a name. I think the API should be reworked >>>> such as: >>>> - The constructor of the parser takes configuration options which can be >>>> reused for several names to parse >>>> - the parse method takes a string as parameter, containing a name >>>> - the parse method returns an immutable Name objects which has getters >>>> for >>>> firstName, lastName etc. >>>> >>>> 3. There are a some useless JavaDocs, which can be dropped. For example: >>>> /** >>>> * Middle name. >>>> */ >>>> private String middle; >>>> >>>> Better rename the variable to middleName and drop the comment. >>>> >>>> 4. Several classes have no tests. The overall test coverage looks good, >>>> so >>>> this may not be a problem. >>>> >>>> 5. I don't think the shade configuration is correct. To me it looks like >>>> commons-text depends on [lang] and also shades some classes. Do we >>>> really >>>> need the explicit dependency? Furthermore, why do we need to shade so >>>> many >>>> classes? Why do we need anything beside StringUtils? >>>> >>>> Nevertheless, this library looks very good. Kudos to Bruno for pushing >>>> this >>>> forward! >>>> >>>> Regards, >>>> Benedikt >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> http://people.apache.org/~britter/ >>>> http://www.systemoutprintln.de/ >>>> http://twitter.com/BenediktRitter >>>> http://github.com/britter >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> http://people.apache.org/~britter/ >>> http://www.systemoutprintln.de/ >>> http://twitter.com/BenediktRitter >>> http://github.com/britter >>> >> >> >> >> -- >> http://people.apache.org/~britter/ >> http://www.systemoutprintln.de/ >> http://twitter.com/BenediktRitter >> http://github.com/britter >> > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter