Slightly OT: Kenneth Copeland and regexps?

Mel Matsuoka Tue, 19 Jun 2001 15:08:53 -0700
>Delivered-To: [EMAIL PROTECTED]
>From: [EMAIL PROTECTED]
>To: [EMAIL PROTECTED]
>Subject: Rejected Message
>Date: Tue, 19 Jun 2001 20:46:45 +0000
>
>Content-Type: text/plain;
>
>The attached mail message has been rejected for the following reason:
>
>Body contains word/phrase 'D*mn ' >
>Additional Information:
>
> <headers snipped and offending word censored from quoted original message>
>
>it can to preserve your Access97 table relations, and for the most part
>does a decent job. It converts the data and fieldtypes pretty d*mn
>accurately. But you should still check its results by hand before yo go

One of the greatest things about life is that you can get humor from things
that aren't even intended to be funny. Looks like my post regarding
MySQL/Access export was "rejected" by a mailserver at kcm.org because it
contained the word 'D@mn' ("a" munged to pass through the filter).

Now I think Jesus is cool and all, but I'm sure he would find this type of
word-based filtering to be just plain silly. Using thier filtering
heuristics, the Holy Bible /itself/ would be rejected by the mail gateway!

This would seem especially tragic when discussing Perl matters, since
anyone on that domain would never be able to learn about such fundamental
things such as:

-- "a$$ignment" operators
-- "subst!tution" 
-- "hashes" (?)
-- Using 3 consecutive "x" characters in code samples to mask out
passwords, i.p. addresses, etc.

I would think/hope that any attempt to filter words should, at the very
most, *only* filter on *whole* keywords, and not just raw sequences of
characters. And I would also think that most mature and intelligent adults
don't need someone else (or a regular expression/filter) to make decisions
about content /for/ them. Words by themselves aren't harmful, it's the
context in which they're used that matters.

That said, perhaps a regular-expression substitution, which replaces
everything but the first letter of each (alleged) "offensive" word with an
asterisk, could be a workable--though still clearly annoying--compromise.
Rejecting the /entire/ message based on finding a single (alleged)
"naughty" word is just ludicrous.

This is what i hacked up on my lunch break just now:

my %badwords_hash;

# load list of bad words into a hash and assign each word
# a "true" value

foreach my $badword (split <LIST_OF_NAUGHTY_WORDS>) {

       # the "1" denotes "true" value for when we compare 
       # bodytext to the badwords list

        %badwords_hash{$badword} = 1;
}

while (<MESSAGE>) {
        foreach my $word (split) {
                
                if ($badwords_hash{$word}) {
                        $starlength = length($word) - 1;
                        $word = substr($word,0,1);
                        $word .= 'o' x $starlength;
                }

        print $word;
        }
}


I know this is inefficient and slow, though. And I didnt even use
regular-expfressions either!:P  What would be a more robust and efficient
way to accomplish something like this?

Aloha,
mel

____________________________________________________________
mel matsuoka                      Hawaiian Image Productions
Chief Executive Alphageek                (vox)1.808.531.5474
[EMAIL PROTECTED]                    (fax)1.808.526.4040
Slightly OT: Kenneth Copeland and regexps?

Reply via email to