Hi Jon, I think we've seen this discussion on the list before (so Christopher, check the archives!)
> > I'm wondering if someone has a great source for a master-list > > of controversial and vulger words that I can use on my site. > > I would like to pattern match input text against this master-list > > in order to prevent vulger and controversial words from appearing > > on my site. > Once you've got the routine working, post it here, because there are many > people who would like to know how to do this properly. > The problems that others have experienced in the past are: > - what happens with "mis"spellings, e.g. "fsck"? > - what happens with dodgy formatting, e.g "f s c k"? > - what happens with words like "Scunthorpe"? Problem 1: add likely/popular mis-spellings to the list of vulger/vulgar language Problem 2: (contrived) very few single-letter words exist so remove intervening white space prior to analysis Problem 2a: (the more popular f*ck - someone suffering the misapprehension that (s)he is somehow NOT guilty of using bad language/being offensive when (s)he plainly is not only doing so but attempting to be deceptive as well...) see response to Problem 1 (the probably habit would be to replace/remove vowels) Problem 3: Scunthorpe contains an unfortunate series of letters (amongst the town's many disadvantages) however the critical four are not a word in and of their own right so employ whitespace (\s) in the RegEx or token analysis. > May I suggest, rather than picking your way through this minefield, you > provide a "report abusive comment" link instead? Most sensible! The employment of a technological solution to a social problem is somewhat shooting the messenger. However some countries are now legislating responsibility that ISPs/employers must discharge (shooting the person who shoes the horses that the Pony Express messenger is riding!?) In this case perhaps one could analyse the incoming text and place an embargo on its publication on the web site until it has been reviewed by a human editor? If we were talking about filtering incoming email, then perhaps the original message could be forwarded/wrapped with a message from the EmailAdmin/System pointing out that a message has arrived from xyz (etc) and has been flagged for a stated reason (but that there is room for interpretation within the mechanical observation) and that the message should not be opened by anyone fearing offence. (this similar to 'security' gateways that don't allow msgs with attachments unless the 'employee' first authorises a 'pass-through') Euro 0.02's worth? =dn -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php