Serhiy Storchaka added the comment:

I agree with Martin and Lars, this issue is not so easy at looks at first 
glance.

For ZIP files we should distinct two different operations.

1. Remove the entry from the central directory (and may be mark local file 
header as invalid if it is possible). This is easy, fast and safe, but it 
doesn't change the size of ZIP file.

2. Physical remove the content of the file from ZIP file. This is so easy as 
remove a line from the text file. In worst case it has linear complexity from 
the size of ZIP file.

2a. The safer way is to create temporary file in the same directory, copy the 
content of original ZIP file excluding deleted file, and then replace original 
ZIP file by modified copy. Be aware about file and parent directory 
permissions, owners, and disk space.

2b. The faster but less safe way is to "shift" the content of the ZIP file 
after deleted file by reading it and writing back in the same ZIP file at 
different position. This way is not safe because when something bad happen at 
writing, we can lost all data. And of course there are crafty ZIP files in 
which the order of files doesn't match the order in central directory or even 
files data overlap.

For performance may be we should implement (2) not as a method to remove single 
file, but as a method which takes the filter function and then left in the ZIP 
file only files for which it returns true.

Or may be implement (1) and then add a method which cleans up the ZIP archive 
be removing all files removed from the central directory. We should discuss 
alternatives.

And as for concrete patch, zipfile.remove.2.patch can read the content of all 
ZIP file in the memory. This is not appropriate, because ZIP file can be very 
large.

----------
stage: patch review -> 

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6818>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to