2015-04-19 17:24 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > Hello Bruno, > > 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita <brunodepau...@yahoo.com.br> > : > >> Thanks for the thorough code review Benedikt! Comments inline. >> >> > 1. IP clearance! There are several comments talking about code adapted >> from >> >PHP libraries. This needs to be listed in NOTICE.txt. For an example see >> >the NOTICE.txt in [lang]. >> Filed https://issues.apache.org/jira/browse/SANDBOX-497 >> And first try to fix it: >> >> >> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497 >> Does that look correct? I also checked with the author before porting it >> to the initial Java version, and told him about the [text] component - >> https://github.com/jasonpriem/HumanNameParser.php/issues/11 > > > I've adjusted the changes a bit and merged it into master. > > >> >> > 2. The names package needs work. >> Completely agree. We can try to make it thread safe too. This will take a >> little longer. >> File new issue for tracking it >> https://issues.apache.org/jira/browse/SANDBOX-498 > > > I've pushed a new branch containing a proposal how to fix this. Please > review. > > >> >> >> > 3. There are a some useless JavaDocs, which can be dropped. For example: >> > (...)> Better rename the variable to middleName and drop the comment. >> What about renaming the variables but leaving the comments? >> > > For I've corrected this for the names package in the SANDBOX-498 branch. > > >> >> > 4. Several classes have no tests. The overall test coverage looks good, >> so >> >this may not be a problem. >> Before the initial release I can start a development cycle just to add >> new tests for parts that seem important - like code with somewhat high [+5] >> cyclomatic complexity and uncovered branches. Just to make sure that the >> initial release will have an acceptable quality and less hidden bugs :-) >> > > Nice! > > >> > 5. I don't think the shade configuration is correct. To me it looks like >> >commons-text depends on [lang] and also shades some classes. Do we really >> >need the explicit dependency? Furthermore, why do we need to shade so >> many >> >classes? Why do we need anything beside StringUtils? >> My mistake. I copied the [functor] configuration without much thinking. >> IIRC, this dependency was added because of the names package. But it >> shouldn't be too complicate to replace it. It is mainly used for checking >> empty/blank values. >> >> So let's drop the [lang] dependency and use something like `$VAR != null >> && ! "".equals($VAR)` instead? >> > > I think we also need StringUtils in the similarity package. We should > handle this after we have merged the changes made to the names package. > > >> Thanks again for reviewing the code! >> > > No problem, thanks for pushing this forward. > > Here are some more things we need to take care of before 1.0: > - JIRA project >
Requested a JIRA project: https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477 > - Rename similarity package to distance package? We have five Distance > implementations but only one Similarity. Furthermore we have FuzzyScore. Is > it a distance or a similarity? It should be renamed accordingly. > - ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>. > Does it make sense to introduce a new class called Vector? > > Benedikt > > >> >> All the best,Bruno >> >> >> From: Benedikt Ritter <brit...@apache.org> >> To: Commons Developers List <dev@commons.apache.org> >> Sent: Sunday, April 19, 2015 9:01 PM >> Subject: [TEXT] Review of the code base >> >> Hi all, >> >> I've looked through the code base of [text] and I already did some clean >> up. However there are a few things, that need more work IMHO: >> >> 1. IP clearance! There are several comments talking about code adapted >> from >> PHP libraries. This needs to be listed in NOTICE.txt. For an example see >> the NOTICE.txt in [lang]. >> >> 2. The names package needs work. Currently the HumanNameParser takes a >> Name >> object (which is just a wrapper around a String) as parameter. The parse >> result is then saved in fields (e.g. firstName). This makes the parser >> unusable after it has parsed a name. I think the API should be reworked >> such as: >> - The constructor of the parser takes configuration options which can be >> reused for several names to parse >> - the parse method takes a string as parameter, containing a name >> - the parse method returns an immutable Name objects which has getters for >> firstName, lastName etc. >> >> 3. There are a some useless JavaDocs, which can be dropped. For example: >> /** >> * Middle name. >> */ >> private String middle; >> >> Better rename the variable to middleName and drop the comment. >> >> 4. Several classes have no tests. The overall test coverage looks good, so >> this may not be a problem. >> >> 5. I don't think the shade configuration is correct. To me it looks like >> commons-text depends on [lang] and also shades some classes. Do we really >> need the explicit dependency? Furthermore, why do we need to shade so many >> classes? Why do we need anything beside StringUtils? >> >> Nevertheless, this library looks very good. Kudos to Bruno for pushing >> this >> forward! >> >> Regards, >> Benedikt >> >> >> >> >> >> >> -- >> http://people.apache.org/~britter/ >> http://www.systemoutprintln.de/ >> http://twitter.com/BenediktRitter >> http://github.com/britter >> >> >> >> >> > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter