Am 26.02.2014 um 09:50 schrieb Pharo4Stef <pharo4s...@free.fr>:

>> 
>> We can have an information retrieval API for aproximate string matching, 
>> i.e. Levenshtein distance (already implemented, various versions), Hamming 
>> distance, both are the most used and simplest edit distances.
>> Then you have Longest common subsequence, Longest common substring (they are 
>> implemented in a package called "Fuzz", #longestCommonSubsequenceWith: ). 
>> Also there is the shift-or adapted for approximate matches (also 
>> implemented), fuzzy phrasing is another world also. Many applications use 
>> Damerau edit distance. Bioinformatics uses the Needleman-Wunsch and 
>> Smith-Waterman, but they call them "aligners" :) but you don't want to code 
>> the optimized version in Smalltalk, some say it could take years.
>> All edit distances out there have specific requirements and no one is better 
>> than another for all cases. For example Jaro-Winkler is useful for one-word 
>> short strings.
>> 
> 
> I’m not sure that all these edit distances should be part of the String core 
> api.
> Now what would be good is to have a chapter describing them. This chapter 
> would work well with the bioSmalltalk one :)
> 
I’m pretty sure they shouldn’t. Most of these are most likely for special 
applications. So a perfect candidate for a string extension package. A real 
modular entity that could load each of them individually would be perfect but 
we don’t have the proper tools, yet. Unless of course every of those algorithms 
is composed of multiple classes and would fit naturally in a package.
But the most important prerequisite would be to make a separate package out of 
it. Did I understand that right that those are part of biosmalltalk? Then the 
problem is that useful things are buried in a specialized application. I 
encounter this often that I don’t know about some code because it is buried 
inside another project. Or I know about it and cannot use it because it is tied 
closely to a project.

my 2 cents,

Norbert

> 
>> You have a lot of options for research. Smalltalkers here are very 
>> experienced and clever, always gives cool advices so don't be afraid to ask.
>> 
>> Cheers,
>> 
>> Hernán
>> 
>>  
>> -- 
>> Cheers,
>> Daniela Meneses

Reply via email to