Hello Don,

Just as an FYI I just added an interface that is a weakening of the full 
"metric" mathematical definition called "SimilarityScore" (mainly for the 
JaroWinkler distance), but as this satisfies the triangle inequality and all of 
the other metric axioms it should implement EditDistance, which is intended to 
represent string comparisons that fully satisfy the definition of a metric. 

Cheers, 
-Rob 

> On Nov 16, 2016, at 11:06 AM, don jeba <donj...@yahoo.com.INVALID> wrote:
> 
> Hello,I am planning to work on this ticket TEXT-2. I need your guidance on 
> naming/placing the class file for implementing this.
> The ask in the ticket is to get Jaccard Index [measures similarity] and 
> Jaccard Distance [measures dissimilarity].
> Below is what I am planning to do.
> Add a new class JaccardBase under package org.apache.commons.text, this will 
> have logic to calculate both the index and distance. As you know Jaccard 
> distance is 1- jaccard index, so there is no separate logic for each of it 
> (index and distance), so planning to keep the calculation logic in a common 
> place.
> Add a new class JaccardIndex under package 
> org.apache.commons.text.similarity, this class will be derived from 
> JaccardBase and the class JaccardIndex will expose public function to get the 
> jaccard index.
> Similar to the above a new class JaccardDistance under package 
> org.apache.commons.text.diff, this class will be derived from JaccardBase and 
> the class JaccardDistance will expose public function to get the jaccard 
> distance.
> The advantage is there is no code duplication.The disadvantage is, the caller 
> wants both the index and distance then, he/she needs to call 2 separate 
> functions (one from JaccardIndex class and one from JaccardDistance class) 
> and we need to do the calculation twice for the same set of input.
> 
> Another option is, have a single class which will return both the index and 
> distance.With this option, I have 2 questions1 where to keep the new class 
> (under which package)2 what should be the name the new class.The disadvantage 
> is option 1 is fixed here.
> 
> I personally prefer option 1 as it looks more clean considering the way the 
> classes are arranged in the package.
> Can you kindly review and comment on your thought.
> Do let me know if I am not clear.
> Thank you,
> Regards,Don Jeba.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to