Hi Daniela,

2014-02-24 14:30 GMT-03:00 Daniela Meneses <daniela11...@gmail.com>:

> Hi to all,
>
> As you may know I'm working on in some improvements for the String class.
> Until now I implemented some missing tests. Right now I'm looking forward
> to add new methods that could be useful based on Ruby API (
> http://www.ruby-doc.org/core-2.1.0/String.html). These are a few of the
> methods that I'm planning to implement:
>
>
>    - chomp(separator=$/) -> new_str
>    - chop() -> new_str
>    - ljust(integer, padstr='') ->new_str
>    - next -> new_str
>    - partition(sep) -> [head, sep, tail]
>
>
> Could you help to find out if these methods are already available for the
> String class?
>
> If you have any idea of new methods for the string class, will be really
> welcome.
>
>
We can have an information retrieval API for aproximate string matching,
i.e. Levenshtein distance (already implemented, various versions), Hamming
distance, both are the most used and simplest edit distances.
Then you have Longest common subsequence, Longest common substring (they
are implemented in a package called "Fuzz", #longestCommonSubsequenceWith:
). Also there is the shift-or adapted for approximate matches (also
implemented), fuzzy phrasing is another world also. Many applications use
Damerau edit distance. Bioinformatics uses the Needleman-Wunsch and
Smith-Waterman, but they call them "aligners" :) but you don't want to code
the optimized version in Smalltalk, some say it could take years.
All edit distances out there have specific requirements and no one is
better than another for all cases. For example Jaro-Winkler is useful for
one-word short strings.

You have a lot of options for research. Smalltalkers here are very
experienced and clever, always gives cool advices so don't be afraid to ask.

Cheers,

Hernán



> --
> Cheers
> ,
> Daniela Meneses
>

Reply via email to