> Wow SA is doing a lot of work already. Can I also have a collapsed body
> string with all whitespaces removed

You would have to write a plugin for this.  Keep in mind that this technique
can lead to unexpected FPs pretty easily.  Once you eliminate all the spaces
(and possibly other punctuation) you only have a string of letters and you
have to guess where the words are.  it is really easy to guess wrong and
some up with no-no words that didn't exist in the source, even if you don't
check for obfu cases.

I'm not saying 'don't do this', although others probably will.  I will say
'don't check for single words' if you do this.  I wouldn't want to try it
checking for anything less than about 15 letters in sequence, and probably
20-25 if I include obfu techniques.

This would probably be a real nice area of study for someone in need of a
thesis.  If you eliminate all spaces/punctuation and only leave the letters,
what are the statistics relating to pulling other nonexistant words out of
the resulting letter stream?  How long on average?  How frequent?  Etc.

        Loren

Reply via email to