Re: string similarity measures

2008-09-04 Thread mathieu
I submitted a patch to handle Aspell phonetic rules. You can find it in JIRA. On Thu, 4 Sep 2008 17:07:09 +0300, "Cam Bazz" <[EMAIL PROTECTED]> wrote: > let me rephrase the problem. I already have a set of bad words. I want to > avoid people inputting typos of the bad words. > for example 'shit'

Re: string similarity measures

2008-09-04 Thread Cam Bazz
let me rephrase the problem. I already have a set of bad words. I want to avoid people inputting typos of the bad words. for example 'shit' is banned, but someone may enter sh1t. how can i flag those phonetically similar bad words to the marked bad words? Best. On Thu, Sep 4, 2008 at 5:02 PM, Ka

Re: string similarity measures

2008-09-04 Thread Karl Wettin
4 sep 2008 kl. 15.54 skrev Cam Bazz: yes, I already have a system for users reporting words. they fall on an operator screen and if operator approves, or if 3 other people marked it as curse, then it is filtered. in the other thread you wrote: I would create 1-5 ngram sized shingles and me

Re: string similarity measures

2008-09-04 Thread Cam Bazz
yes, I already have a system for users reporting words. they fall on an operator screen and if operator approves, or if 3 other people marked it as curse, then it is filtered. in the other thread you wrote: >I would create 1-5 ngram sized shingles and measure the distance using Tanimoto coefficien

Re: string similarity measures

2008-09-04 Thread Karl Wettin
4 sep 2008 kl. 14.38 skrev Cam Bazz: Hello, This came up before but - if we were to make a swear word filter, string edit distances are no good. for example words like `shot` is confused with `shit`. there is also problem with words like hitchcock. appearently i need something like sound

string similarity measures

2008-09-04 Thread Cam Bazz
Hello, This came up before but - if we were to make a swear word filter, string edit distances are no good. for example words like `shot` is confused with `shit`. there is also problem with words like hitchcock. appearently i need something like soundex or double metaphone. the thing is - these are