Moti wrote:

Thanks for the reply, I've now read that discussion.

First:

If anyone with a Far Eastern Code Page (Japanese,Chinese,Korean) is on this
list, please send an example of how the attachment filenames in your local
code page are send in MIME.
Its for your own good :)

It should look something like this: (specificaly the name and the filename
fields)

------=_NextPart_000_0032_01C53787.386D0A70
Content-Type: video/x-ms-wmv;
       name="=?windows-1255?B?4ePp9+Qud212?="
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
       filename="=?windows-1255?B?4ePp9+Qud212?="




I have already "fixed" this problem in the next release by calling MIME::Base64 - but your comments about issues regarding multi-byte encoding are not in my area of expertise, so I'm sure if that changes anything.


The extention is always in plain ascii 7 , there is no specific "local
extentions".




Great - so that should mean if we simply "raw" base64 decode the filenames, the extensions should always be trustable?


In case of multi byte encoding there is a need to filter out only the ascii
characters and run perlscan on the. I.E. in some way discard or replace all
the non ASCII characters. This seems to much more complicated. I am not sure
there is a package for perl to work with Windows Code Pages.



Yuck - I don't like the smell of that!

Can you create some empty files for me with the most exotic multi-byte filenames you can come up with, as examples, along with how you would write those same filenames within quarantine-attachments.txt using vi/vim? I want to know how the choice of Code Page would affect the filename (when you say "Code Page", I assume you are referring to charset?). Extensions is one thing, but we should also ensure that if you want to block specific filenames, that the quarantine-attachment.txt file is capable of handling such filenames too.

How do Unix systems (and editors specifically) handle multi byte chars? I mean, if I am on a Linux box set for UTF8 (via LANG and LOCALE settings) and say in quarantine-attachments.txt to block:

יקה.wmv<TAB>0<TAB>yucky Windows movie

and someone sends in that filename - but encoded according to the "windows-1255" (Hebrew?) charset. Will MIME::Base64 decode that base64 encoded string back to the same "8bit" string referred to within quarantine-attachments.txt (i.e. so they match)

I mean, I also assume there's a different (older) non-UTF ISO charset for Hebrew that you could have been using instead of UTF8 on your Unix box - would the string be different from the UTF8 version?

This locale thing gets complicated REALLY quickly. :-( [esp for an ignorant English speaker like myself]

--
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Qmail-scanner-general mailing list
Qmail-scanner-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/qmail-scanner-general

Reply via email to