Hello,I am planning to work on this ticket TEXT-2. I need your guidance on 
naming/placing the class file for implementing this.
The ask in the ticket is to get Jaccard Index [measures similarity] and Jaccard 
Distance [measures dissimilarity].
Below is what I am planning to do.
Add a new class JaccardBase under package org.apache.commons.text, this will 
have logic to calculate both the index and distance. As you know Jaccard 
distance is 1- jaccard index, so there is no separate logic for each of it 
(index and distance), so planning to keep the calculation logic in a common 
place.
Add a new class JaccardIndex under package org.apache.commons.text.similarity, 
this class will be derived from JaccardBase and the class JaccardIndex will 
expose public function to get the jaccard index.
Similar to the above a new class JaccardDistance under package 
org.apache.commons.text.diff, this class will be derived from JaccardBase and 
the class JaccardDistance will expose public function to get the jaccard 
distance.
The advantage is there is no code duplication.The disadvantage is, the caller 
wants both the index and distance then, he/she needs to call 2 separate 
functions (one from JaccardIndex class and one from JaccardDistance class) and 
we need to do the calculation twice for the same set of input.

Another option is, have a single class which will return both the index and 
distance.With this option, I have 2 questions1 where to keep the new class 
(under which package)2 what should be the name the new class.The disadvantage 
is option 1 is fixed here.

I personally prefer option 1 as it looks more clean considering the way the 
classes are arranged in the package.
Can you kindly review and comment on your thought.
Do let me know if I am not clear.
Thank you,
Regards,Don Jeba.

Reply via email to