Raphael Hertzog <[EMAIL PROTECTED]> writes: > On Thu, 28 Jun 2007, Russ Allbery wrote:
>> Well, he extracts everything between <>, but I believe we still lose >> if, for instance, there's a # in the e-mail address (which is an >> entirely valid RFC 2822 character). I'm a little worried about +, >> which is a very common character and sometimes has special >> interpretations in URLs. > So fix the # case (I can do a fixed list of character translation). > Email address can contain almost anything but in practice they don't > contain much fancy stuff compared to real names. The "+" has a special > meaning only in CGI (GET) parameters AFAIK. According to RFC 2396, the list of characters reserved, banned, or disrecommended for URIs are: ; / ? : @ & = + $ , < > # % " { } | \ ^ [ ] ` and space. The safest thing to do would be to map all of those characters to _. (Some of them we could get away with not mapping, but I prefer to appeal to a clear authority for things like this rather than generating a custom list.) We still lose if someone has a non-ASCII or control character in their e-mail address, but that's probably not a likely problem given that RFC 2822 doesn't permit that either. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/> -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]