Serhiy Storchaka added the comment:

No, checking the first bytes of the file is not appropriate option. zipfile 
should support the Python zip application format [1].

I see two options:

1. Make is_zipfile() more strict that the ZipFile constructor. The later 
supports ZIP files with a data past the comment or with truncated comments, but 
the former should reject them.

2. Make both is_zipfile() and the ZipFile constructor more robust. They should 
check not just the EOCD signature, but check the Zip64 end of central directory 
record (if exists) and the first central file header signature (if the ZIP file 
is not empty).

It may be that PDF files contain PK\005\006 not accidentally, but because they 
contain embedded ZIP files (I don't know if this is a case). In that 
circumstances is_zipfile() returning True is correct.

[1] https://docs.python.org/3/library/zipapp.html

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28494>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to