Hi Bruna,
2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <brunodepau...@yahoo.com.br>: > > Hello Benedikt! > > Metric feels like it's something more general, but I'm not sure. > You're right. Metric was supposed to be a general interface, > representing the String Metric from the Wikipedia article. > > and the interface from StringMetric to StringDistance. > I'm reading the Myers paper, and already have a local branch with the > Myers algorithm from [collections] ported to [text]. > Perhaps we could move the StringMetric interface to o.a.c.text package, > and create StringDistance or EditDistance interface in o.a.c.text.distance. > This way we can have String Metrics as in Wikipedia, as being a way of > giving a valuefor comparing two strings. We would have the edit distances > in the distance package, and the diff algorithms in another diff package. > All of them being String Metrics. > What do you think? > Sounds good, although I'm not sure I understand where you are going with the marker interface. What is it's purpose? > > > I think we should consider renaming everything to distance, since > the> > implemented algorithms all end on *Distance. So we would change the > package> > name from o.a.c.text.similarity to o.a.c.text.distance and the > interface> > from StringMetric to StringDistance.> >> > > Looking at the code again, it seems like the algorithms all really > return a> similarity score and not a distance. For exmaple FuzzyDistance > JavaDoc> states: "A higher score indicates a higher similarity". If this is > a case,> maybe it makes more sense to rename everything to Similarity? > I'm in favor of dropping score and similarity, and adopting distance in > the package, classes and javadocs, as it is used in other tools (e.g. Solr, > Talend, Informatica IIR, etc). > Okay, but we need to make sure all algorithms really return a distance then. As I said, FuzzyDistance currently really returns a similarity score. An algorithm returning a distance should return a higher number for higher distances. Benedikt > All the best,Bruno > > > From: Benedikt Ritter <brit...@apache.org> > To: Commons Developers List <dev@commons.apache.org> > Sent: Sunday, December 14, 2014 6:20 PM > Subject: Re: [TEXT] Distance vs. Metric vs. Similarity > > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <brit...@apache.org>: > > > > Hi, > > > > currently the wording in commons text is a bit confusing. We have the > > three terms: > > > > - distance > > - similarity > > - metric > > > > Distance and similarity seem to be just opposites of the same thing. A > > great distance indicates a small similarity between two character > > sequences. Metric feels like it's something more general, but I'm not > sure. > > > > I think we should consider renaming everything to distance, since the > > implemented algorithms all end on *Distance. So we would change the > package > > name from o.a.c.text.similarity to o.a.c.text.distance and the interface > > from StringMetric to StringDistance. > > > > Looking at the code again, it seems like the algorithms all really return a > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc > states: "A higher score indicates a higher similarity". If this is a case, > maybe it makes more sense to rename everything to Similarity? > > > > > > WDYT? > > > > Benedikt > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > > > > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > > > > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter