Hi Bruno, > On 7 Mar 2019, at 21:18, Bruno P. Kinoshita <ki...@apache.org> wrote: > > Hi Alex, > Can't recall why it was done that way. When the initial code for the edit > distances was created, some Java libraries like Simmetrics, > java-string-similarity, Lucene, and also R/Python code were used to verify > the output of the edit distances. > Maybe we used Math.round just to get a test passing, which I agree it had to > be documented. > But even better if we just drop the Math.round and instead update the tests > with that assertEquals(expected, actual, threshold) method, with a good > enough threshold. > What do you think?
I’d favour dropping the round and adding it to the Changes.xml via a Jira ticket so it is noted if someone upgrades. They can always restore functionality to as-it-was by doing a round on the output of the class. If I understand the metric correctly (intersect over union) to have a difference in the 3rd decimal place would require the union of the two character sets to be above 200, i.e. a string containing over 200 unique characters, e.g. A) 0/200 = 0 B) 1/200 = 0.005 C) 2/200 = 0.01 In this case result A and C can be distinguished but not B and C due to round up. So in practical terms it would not make a difference unless using a large character set. For ASCII strings there is no difference. I’ve already made the test using the python distance.jaccard function from the distance library in the PR for Text-155. So changing the test is simple. It’s just the decision on whether to do it. Alex > CheersBruno > > On Friday, 8 March 2019, 4:49:52 am NZDT, Alex Herbert > <alex.d.herb...@gmail.com> wrote: > > A quick question about the JaccardSimilarity class: > > Q. Why does it round the similarity to 2 decimal places? > > This is not documented. > > It is also done in the complimentary JaccardDistance class. > > Looking at the history in git it seems to have always been that way. > First commit was 2016-11-27. > > Thanks, > > Alex > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org