2015-04-27 8:10 GMT+02:00 Benedikt Ritter <brit...@apache.org>: > Hello Phil, > > 2015-04-27 4:56 GMT+02:00 Phil Steitz <phil.ste...@gmail.com>: > >> On 4/26/15 3:14 AM, Benedikt Ritter wrote: >> > 2015-04-26 11:52 GMT+02:00 Benedikt Ritter <brit...@apache.org>: >> > >> >> >> >> 2015-04-19 17:29 GMT+02:00 Benedikt Ritter <brit...@apache.org>: >> >> >> >>> >> >>> 2015-04-19 17:24 GMT+02:00 Benedikt Ritter <brit...@apache.org>: >> >>> >> >>>> Hello Bruno, >> >>>> >> >>>> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita < >> >>>> brunodepau...@yahoo.com.br>: >> >>>> >> >>>>> Thanks for the thorough code review Benedikt! Comments inline. >> >>>>> >> >>>>>> 1. IP clearance! There are several comments talking about code >> >>>>> adapted from >> >>>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an >> example >> >>>>> see >> >>>>>> the NOTICE.txt in [lang]. >> >>>>> Filed https://issues.apache.org/jira/browse/SANDBOX-497 >> >>>>> And first try to fix it: >> >>>>> >> >>>>> >> >>>>> >> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497 >> >>>>> Does that look correct? I also checked with the author before >> porting >> >>>>> it to the initial Java version, and told him about the [text] >> component - >> >>>>> https://github.com/jasonpriem/HumanNameParser.php/issues/11 >> >>>> >> >>>> I've adjusted the changes a bit and merged it into master. >> >>>> >> >>>> >> >>>>>> 2. The names package needs work. >> >>>>> Completely agree. We can try to make it thread safe too. This will >> take >> >>>>> a little longer. >> >>>>> File new issue for tracking it >> >>>>> https://issues.apache.org/jira/browse/SANDBOX-498 >> >>>> >> >>>> I've pushed a new branch containing a proposal how to fix this. >> Please >> >>>> review. >> >>>> >> >>>> >> >>>>> >> >>>>>> 3. There are a some useless JavaDocs, which can be dropped. For >> >>>>> example: >> >>>>>> (...)> Better rename the variable to middleName and drop the >> comment. >> >>>>> What about renaming the variables but leaving the comments? >> >>>>> >> >>>> For I've corrected this for the names package in the SANDBOX-498 >> branch. >> >>>> >> >>>> >> >>>>>> 4. Several classes have no tests. The overall test coverage looks >> >>>>> good, so >> >>>>>> this may not be a problem. >> >>>>> Before the initial release I can start a development cycle just to >> add >> >>>>> new tests for parts that seem important - like code with somewhat >> high [+5] >> >>>>> cyclomatic complexity and uncovered branches. Just to make sure >> that the >> >>>>> initial release will have an acceptable quality and less hidden >> bugs :-) >> >>>>> >> >>>> Nice! >> >>>> >> >>>> >> >>>>>> 5. I don't think the shade configuration is correct. To me it looks >> >>>>> like >> >>>>>> commons-text depends on [lang] and also shades some classes. Do we >> >>>>> really >> >>>>>> need the explicit dependency? Furthermore, why do we need to shade >> so >> >>>>> many >> >>>>>> classes? Why do we need anything beside StringUtils? >> >>>>> My mistake. I copied the [functor] configuration without much >> thinking. >> >>>>> IIRC, this dependency was added because of the names package. But it >> >>>>> shouldn't be too complicate to replace it. It is mainly used for >> checking >> >>>>> empty/blank values. >> >>>>> >> >>>>> So let's drop the [lang] dependency and use something like `$VAR != >> >>>>> null && ! "".equals($VAR)` instead? >> >>>>> >> >>>> I think we also need StringUtils in the similarity package. We should >> >>>> handle this after we have merged the changes made to the names >> package. >> >>>> >> >>>> >> >>>>> Thanks again for reviewing the code! >> >>>>> >> >>>> No problem, thanks for pushing this forward. >> >>>> >> >>>> Here are some more things we need to take care of before 1.0: >> >>>> - JIRA project >> >>>> >> >>> Requested a JIRA project: >> >>> >> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477 >> >>> >> >> Here is the JIRA project [1]. I'll find out, how we can move the >> Commons >> >> Text issues from the Sandbox project to the new project. >> >> >> > I've migrated all issues from the Sandbox project to the new Commons >> Text >> > project. I've also updated the project documentation. >> >> Did I miss the promotion VOTE? >> > > No, but I think the component should be in a state to be released before > starting a promotion vote. We have examples of components which have been > promoted to early and I don't want this to happen with [text]. >
I hope we haven't stepped on anybody's toes or raised the impression we're doing stuff without properly informing the dev ML about our plans. I will write an informational e-mail together illustrating what we have implemented, what our plans are and what the road map for promotion is. Thank you! Benedikt > > Benedikt > > >> >> Phil >> > >> > >> >> Benedikt >> >> >> >> [1] https://issues.apache.org/jira/browse/TEXT >> >> >> >> >> >>> >> >>>> - Rename similarity package to distance package? We have five >> Distance >> >>>> implementations but only one Similarity. Furthermore we have >> FuzzyScore. Is >> >>>> it a distance or a similarity? It should be renamed accordingly. >> >>>> - ConsineSimilariy uses Vectors modeled as Map<CharSequence, >> Integer>. >> >>>> Does it make sense to introduce a new class called Vector? >> >>>> >> >>>> Benedikt >> >>>> >> >>>> >> >>>>> All the best,Bruno >> >>>>> >> >>>>> >> >>>>> From: Benedikt Ritter <brit...@apache.org> >> >>>>> To: Commons Developers List <dev@commons.apache.org> >> >>>>> Sent: Sunday, April 19, 2015 9:01 PM >> >>>>> Subject: [TEXT] Review of the code base >> >>>>> >> >>>>> Hi all, >> >>>>> >> >>>>> I've looked through the code base of [text] and I already did some >> clean >> >>>>> up. However there are a few things, that need more work IMHO: >> >>>>> >> >>>>> 1. IP clearance! There are several comments talking about code >> adapted >> >>>>> from >> >>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an >> example see >> >>>>> the NOTICE.txt in [lang]. >> >>>>> >> >>>>> 2. The names package needs work. Currently the HumanNameParser >> takes a >> >>>>> Name >> >>>>> object (which is just a wrapper around a String) as parameter. The >> parse >> >>>>> result is then saved in fields (e.g. firstName). This makes the >> parser >> >>>>> unusable after it has parsed a name. I think the API should be >> reworked >> >>>>> such as: >> >>>>> - The constructor of the parser takes configuration options which >> can be >> >>>>> reused for several names to parse >> >>>>> - the parse method takes a string as parameter, containing a name >> >>>>> - the parse method returns an immutable Name objects which has >> getters >> >>>>> for >> >>>>> firstName, lastName etc. >> >>>>> >> >>>>> 3. There are a some useless JavaDocs, which can be dropped. For >> example: >> >>>>> /** >> >>>>> * Middle name. >> >>>>> */ >> >>>>> private String middle; >> >>>>> >> >>>>> Better rename the variable to middleName and drop the comment. >> >>>>> >> >>>>> 4. Several classes have no tests. The overall test coverage looks >> good, >> >>>>> so >> >>>>> this may not be a problem. >> >>>>> >> >>>>> 5. I don't think the shade configuration is correct. To me it looks >> like >> >>>>> commons-text depends on [lang] and also shades some classes. Do we >> >>>>> really >> >>>>> need the explicit dependency? Furthermore, why do we need to shade >> so >> >>>>> many >> >>>>> classes? Why do we need anything beside StringUtils? >> >>>>> >> >>>>> Nevertheless, this library looks very good. Kudos to Bruno for >> pushing >> >>>>> this >> >>>>> forward! >> >>>>> >> >>>>> Regards, >> >>>>> Benedikt >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> http://people.apache.org/~britter/ >> >>>>> http://www.systemoutprintln.de/ >> >>>>> http://twitter.com/BenediktRitter >> >>>>> http://github.com/britter >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> http://people.apache.org/~britter/ >> >>>> http://www.systemoutprintln.de/ >> >>>> http://twitter.com/BenediktRitter >> >>>> http://github.com/britter >> >>>> >> >>> >> >>> >> >>> -- >> >>> http://people.apache.org/~britter/ >> >>> http://www.systemoutprintln.de/ >> >>> http://twitter.com/BenediktRitter >> >>> http://github.com/britter >> >>> >> >> >> >> >> >> -- >> >> http://people.apache.org/~britter/ >> >> http://www.systemoutprintln.de/ >> >> http://twitter.com/BenediktRitter >> >> http://github.com/britter >> >> >> > >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter