T.B.: > Hallo Everyone, > I have a question: I want to implement an attachment filter > ( mime_header_checks ), that filters special unicode "Format characters". > Examples: > 0x202E (right-to-left override) > 0x202B (right-to-left embedding) > 0x202D (left-to-right override) > 0x202A (left-to-right embedding) > > Complete list here (page 4): > http://www.unicode.org/charts/PDF/U2000.pdf (page 4) > > You can look the reason up here: > http://www.h-online.com/security/news/item/Backwards-Unicode-names-hides-malware-and-viruses-1242114.html > > Any suggestions how to do that?
According to RFC 2183/2184, content-disposition names containing non-ASCII content must be encoded as ASCII strings. This means you may need to handle content-disposition names that violate RFC 2183/2184, besides correctly-encoded forms for UTF-8, UTF-16, and so on. I am not sure that regular expressions are the tool for this job. Wietse