Hi Rob, LCS can still be useful for bioinformatics/genetics. So I'd say that's worth including. In Java, if I ever needed it, I would probably look for it at Biojava (which I just did and couldn't easily find it there).
As for the other string distances, I always look at this GitHub project: https://github.com/tdebatty/java-string-similarity And also Talend (I think Data Quality has some string distances). However, I think having the API design, and some string distances implemented could be enough for a 1.0. Then we can add more, and release more versions. Cheers Bruno ----- Original Message ----- > From: Rob Tompkins <chtom...@gmail.com> > To: Commons Developers List <dev@commons.apache.org> > Sent: Monday, 19 December 2016 3:47 PM > Subject: [text][TEXT-32] Regarding more edit distances. > > Hello, > > With the thought that we want more "edit distances”/“similarity scores” in > the codebase for the potential 1.0 release of TEXT, I’ve opened an associated > Jira (TEXT-32). I was wondering if any folks had any input about further > ideas. > > The first idea that I stumbled upon was an edit distance based upon the > longest > common substring. It feels a tad coarse, but that doesn’t necessarily mean > that > it’s not worth including. > > Other thoughts and ideas? > > Cheers, > -Rob > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org