> ISTM that because the output of strings is not discrete list of
> potential words, but is instead a long list of concatenated
> characters, this problem is really rather daunting. The output should
> probably be first broken up into something resembling words by perhaps
> breaking on non-alphabetic characters. That should do two things: 1)
> get you somthing that resembles words to actually test and 2) somewhat
> smaller set of "stuff" to check.
>
> This won't necessarily handle "compound" words though where two
> word-like things are jammed together, or an actual word is embedded
> within a string of nonsense.
>
> I think this problem is potentially rather harder than I thought when
> I saw OP's original question.
>

It does not need to be comprehensive. Would it be possible to only
show lines that have "words" (continuous strings) of alpha characters
that are all lowercase except for the first character? That would
handle about 90% of the work by eliminating lines line these:
pDuf
#k0H}g)
GoV5
rLeY1
TMlq,*

-- 
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to