Re: [TEXT] Review of the code base

Benedikt Ritter Sun, 19 Apr 2015 08:29:42 -0700

2015-04-19 17:24 GMT+02:00 Benedikt Ritter <brit...@apache.org>:

> Hello Bruno,
>
> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita <brunodepau...@yahoo.com.br>
> :
>
>> Thanks for the thorough code review Benedikt! Comments inline.
>>
>> > 1. IP clearance! There are several comments talking about code adapted
>> from
>> >PHP libraries. This needs to be listed in NOTICE.txt. For an example see
>> >the NOTICE.txt in [lang].
>> Filed https://issues.apache.org/jira/browse/SANDBOX-497
>> And first try to fix it:
>>
>>
>> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497
>> Does that look correct? I also checked with the author before porting it
>> to the initial Java version, and told him about the [text] component -
>> https://github.com/jasonpriem/HumanNameParser.php/issues/11
>
>
> I've adjusted the changes a bit and merged it into master.
>
>
>>
>> > 2. The names package needs work.
>> Completely agree. We can try to make it thread safe too. This will take a
>> little longer.
>> File new issue for tracking it
>> https://issues.apache.org/jira/browse/SANDBOX-498
>
>
> I've pushed a new branch containing a proposal how to fix this. Please
> review.
>
>
>>
>>
>> > 3. There are a some useless JavaDocs, which can be dropped. For example:
>> > (...)> Better rename the variable to middleName and drop the comment.
>> What about renaming the variables but leaving the comments?
>>
>
> For I've corrected this for the names package in the SANDBOX-498 branch.
>
>
>>
>> > 4. Several classes have no tests. The overall test coverage looks good,
>> so
>> >this may not be a problem.
>> Before the initial release I can start a development cycle just to add
>> new tests for parts that seem important - like code with somewhat high [+5]
>> cyclomatic complexity and uncovered branches. Just to make sure that the
>> initial release will have an acceptable quality and less hidden bugs :-)
>>
>
> Nice!
>
>
>> > 5. I don't think the shade configuration is correct. To me it looks like
>> >commons-text depends on [lang] and also shades some classes. Do we really
>> >need the explicit dependency? Furthermore, why do we need to shade so
>> many
>> >classes? Why do we need anything beside StringUtils?
>> My mistake. I copied the [functor] configuration without much thinking.
>> IIRC, this dependency was added because of the names package. But it
>> shouldn't be too complicate to replace it. It is mainly used for checking
>> empty/blank values.
>>
>> So let's drop the [lang] dependency and use something like `$VAR != null
>> && ! "".equals($VAR)` instead?
>>
>
> I think we also need StringUtils in the similarity package. We should
> handle this after we have merged the changes made to the names package.
>
>
>> Thanks again for reviewing the code!
>>
>
> No problem, thanks for pushing this forward.
>
> Here are some more things we need to take care of before 1.0:
> - JIRA project
>


Requested a JIRA project:
https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477


> - Rename similarity package to distance package? We have five Distance
> implementations but only one Similarity. Furthermore we have FuzzyScore. Is
> it a distance or a similarity? It should be renamed accordingly.
> - ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>.
> Does it make sense to introduce a new class called Vector?
>
> Benedikt
>
>
>>
>> All the best,Bruno
>>
>>
>>       From: Benedikt Ritter <brit...@apache.org>
>>  To: Commons Developers List <dev@commons.apache.org>
>>  Sent: Sunday, April 19, 2015 9:01 PM
>>  Subject: [TEXT] Review of the code base
>>
>> Hi all,
>>
>> I've looked through the code base of [text] and I already did some clean
>> up. However there are a few things, that need more work IMHO:
>>
>> 1. IP clearance! There are several comments talking about code adapted
>> from
>> PHP libraries. This needs to be listed in NOTICE.txt. For an example see
>> the NOTICE.txt in [lang].
>>
>> 2. The names package needs work. Currently the HumanNameParser takes a
>> Name
>> object (which is just a wrapper around a String) as parameter. The parse
>> result is then saved in fields (e.g. firstName). This makes the parser
>> unusable after it has parsed a name. I think the API should be reworked
>> such as:
>> - The constructor of the parser takes configuration options which can be
>> reused for several names to parse
>> - the parse method takes a string as parameter, containing a name
>> - the parse method returns an immutable Name objects which has getters for
>> firstName, lastName etc.
>>
>> 3. There are a some useless JavaDocs, which can be dropped. For example:
>> /**
>>  * Middle name.
>>  */
>> private String middle;
>>
>> Better rename the variable to middleName and drop the comment.
>>
>> 4. Several classes have no tests. The overall test coverage looks good, so
>> this may not be a problem.
>>
>> 5. I don't think the shade configuration is correct. To me it looks like
>> commons-text depends on [lang] and also shades some classes. Do we really
>> need the explicit dependency? Furthermore, why do we need to shade so many
>> classes? Why do we need anything beside StringUtils?
>>
>> Nevertheless, this library looks very good. Kudos to Bruno for pushing
>> this
>> forward!
>>
>> Regards,
>> Benedikt
>>
>>
>>
>>
>>
>>
>> --
>> http://people.apache.org/~britter/
>> http://www.systemoutprintln.de/
>> http://twitter.com/BenediktRitter
>> http://github.com/britter
>>
>>
>>
>>
>>
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Re: [TEXT] Review of the code base

Reply via email to