[issue10030] Patch for zip decryption speedup
New submission from Shashank : As promised in this thread http://mail.python.org/pipermail/python-dev/2009-August/091450.html (a year ago!), attached is a patch that replaces simple zip decryption logic written in pure python with that in C. As reported in the link above, this can result in speedups up to couple of orders of magnitude. There doesn't seem to be any need to add any new tests as this patch doesn't change any public API -- components: Library (Lib) files: zipdecrypt.patch keywords: patch messages: 118030 nosy: shashank priority: normal severity: normal status: open title: Patch for zip decryption speedup type: performance versions: Python 2.7 Added file: http://bugs.python.org/file19135/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Shashank added the comment: I have updated the patch with a check for the availability of C impl and to use pure-py impl as a fallback. How do you suggest would the tests change? As I had mentioned before, in my understanding since there is no change in the API the already existing tests should work. One can simulate the absence of C impl in a system where it is present but AFAIU this is not what it is usually done (e.g, in the case of optional zlib dependency in the same module) -- Added file: http://bugs.python.org/file19149/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Shashank added the comment: Attached is a patch with changes in Lib/test/test_zipfile.py to test both C and pure-py impls (on systems where the C impl is present). Admittedly, this approach to emulating the absence of C impl is a bit hacky. This is primarily because the changed class is not a part of the public API and hence not being tested directly. David, could you verify that the approach is ok? -- Added file: http://bugs.python.org/file19196/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Changes by Shashank : Added file: http://bugs.python.org/file19197/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Shashank added the comment: >the C module should be private and therefore called _zipdecrypt done >if you want to avoid API mismatch, you could give a tp_call to your C >>decrypter object, rather than a "decrypt" method done >- you can put all initialization code in zipdecrypt_new and avoid the >need >for zipdecrypt_init keeping this similar to the existing _ZipDecrypter class in ZipFile (all initialization in init rather than new), which was probably to allow re-init and re-use of one instance >it's better to use the "y*" code in PyArg_ParseTuple, rather than "s#" y* does not seem to be available in 2.7, using s* instead >you should define your module as PY_SSIZE_T_CLEAN and use Py_ssize_t >as >length variables (rather than int) done >you *mustn't* change the contents of the buffer which is given you by >"s#" or >"y*", since that buffer is read-only (it can be a bytes >object); instead, >create a new bytes object using >PyBytes_FromStringAndSize(NULL, length) and >write into that; or, if >you want a read-write buffer, use the "w*" code corrected, not altering the input buffer, reading input buffer as s* -- Added file: http://bugs.python.org/file19347/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Shashank added the comment: I had uploaded an incorrect patch. New corrected patch against trunk (on Mac OS uploaded). -- Added file: http://bugs.python.org/file19494/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Changes by Shashank : Removed file: http://bugs.python.org/file19494/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10030] Patch for zip decryption speedup
Changes by Shashank : Added file: http://bugs.python.org/file19495/zipdecrypt.patch ___ Python tracker <http://bugs.python.org/issue10030> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10323] Final state of underlying sequence in islice
New submission from Shashank : -- Converting the discussion here http://mail.python.org/pipermail/python-list/2010-November/1259601.html to a bug report (+nosy for everyone that responded, quoting the initial message for context) Are there any promises made with regard to final state of the underlying sequence that islice slices? for example consider this >>> from itertools import * >>> c = count() >>> list(islice(c, 1, 3, 50)) [1] >>> c.next() 51 Now, the doc [1] says "If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position". It clearly is not stopping at stop (3). Further, the doc gives an example of how this is *equivalent* to a generator defined in the same section. It turns out, these two are not exactly the same if the side-effect of the code is considered on the underlying sequence. Redefining islice using the generator function defined in the doc gives different (and from one pov, expected) result >>> def islice(iterable, *args): ... # islice('ABCDEFG', 2) --> A B ... >>> c = count() >>> list(islice(c, 1, 3, 50)) [1] >>> c.next() 2 While "fixing" this should be rather easy in terms of the change in code required it might break any code depending on this seemingly incorrect behavior -- components: Interpreter Core messages: 120482 nosy: ned.deily, rhettinger, shashank, terry.reedy priority: normal severity: normal status: open title: Final state of underlying sequence in islice type: behavior versions: Python 2.7, Python 3.1, Python 3.2 ___ Python tracker <http://bugs.python.org/issue10323> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10323] Final state of underlying sequence in islice
Shashank added the comment: @Raymond: I don't have a particular use case where I had a problem with this behavior. I came across this "problem" when looking at this issue http://bugs.python.org/issue6305. An important problem that can happen with this behavior is that it does extra work that is not needed. Consider the case (it appears in Lib/test/test_itertools.py): islice(count(), 1, 10, maxsize) where maxsize is MAX_Py_ssize_t Current implementation goes all the way up to maxsize when it should have just stopped at 10. You are probably right in saying that the caller can make sure that the parameters are such that such cases don't arise but I would still like to see python doing the best possible thing as far as possible. -- ___ Python tracker <http://bugs.python.org/issue10323> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
shashank added the comment: A. Regrading Jakub's tests, I suppose the changes needed are to for every name in tar i) find reasonable occurrence of symlink's name and replace it with smylink's linkname ii) convert it to normal path and then check for relative / absolute paths B. Jakub, are your tests exhaustive or there maybe some corner cases missing? I think we don't have tests for hardlinks, yet. Also I think, we will need tests for CHRTYPE, FIFOTYPE and BLKTYPE of tar's typeflag[0] [0] http://www.gnu.org/software/tar/manual/html_node/Standard.html -- nosy: +shanxS ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
shashank added the comment: 1. I have done some changes to Lar's patch to address class of bugs which Jakub found. Attached patch safetarfile-2.diff Patch is for code only and is work in progress. 2. However, there maybe several edge cases which have not been covered. Going by types of bugs we want to catch: * Don't allow creating files whose absolute path is not under the destination. * Don't allow creating links (hard or soft) which link to a path outside of the destination. * Don't create device nodes. I suspect there may be more so which haven't been mentioned yet, one of which I have listed below. 3. Now, on to patch, Safetar now tries to keep a list of symlinks it has seen so far and tries to figure effective path of current name which may use symlinks. This approach does address bugs found in Jakub's comment, details below, but when symlink's link has a symlink in it, there are cases which this impl. let slip. For example: # tar -tvf dirsym_rouge.tar drwxr-xr-x root/root 0 2018-09-12 19:03 dirsym/ lrwxrwxrwx root/root 0 2018-09-12 18:39 dirsym/sym -> . lrwxrwxrwx root/root 0 2018-09-12 19:02 dirsym/symsym3 -> ../dirsym/sym/../.. This escapes the check since, given name "../dirsym/sym/../.." translates to "..", ideally this should have given relative link warning. Above symlink is valid. But for: # tar -tvf dirsym.tar drwxr-xr-x root/root 0 2018-09-12 18:44 dirsym/ lrwxrwxrwx root/root 0 2018-09-12 18:44 dirsym/sym -> . lrwxrwxrwx root/root 0 2018-09-12 18:44 dirsym/symsym -> dirsym/sym/../.. This fails with warning of relative link name, as expected. given name "dirsym/sym/../.." translates to "../.." However, the symlink points to invalid path which may or maynot be useful. 4. Regarding bugs reported by Jakub, following enumerates the effective name that gets computed by Safetar: absolute1.tar "/tmp/moo" translates to "/tmp/moo" absolute2.tar "//tmp/moo" translates to "//tmp/moo" dirsymlink.tar "tmp" translates to "tmp" "/tmp" translates to "tmp" dirsymlink2a.tar "cur" translates to "cur" "." translates to "." "par" translates to "par" "cur/.." translates to ".." "par/moo" translates to "../moo" dirsymlink2b.tar "cur" translates to "cur" "." translates to "." "cur/par" translates to "par" ".." translates to ".." "par/moo" translates to "../moo" relative0.tar "../moo" translates to "../moo" relative2.tar "tmp/../../moo" translates to "../moo" symlink.tar "moo" translates to "moo" "/tmp/moo" translates to "/tmp/moo" -- Added file: https://bugs.python.org/file47800/safetarfile-2.diff ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
shashank added the comment: Figured a fix for the bug I found, trick was to keep track of current working dir of symlink it was trying to evaluate. Attached patch: safetarfile-3.diff Patch is for code only. I'd like to see this go thorough, and would appreciate feedback. -- Added file: https://bugs.python.org/file47803/safetarfile-3.diff ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
shashank added the comment: I can't use Jakub's repo (or Makefile from that repo) directly because it relies on tar, which doesn't look like dependency for building Python. I can make similar tarballs but I am not sure how licensing will work. I can add tarballs for the cases I discovered. As of now, I'll move ahead with my version of tarballs and will change them for licensing requirement as needed before merging it to source code. Would you prefer a PR over a patch? -- ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
shashank added the comment: Added tests. Patch file: safetarfile-4.diff Following works with 456 tests passed, after doing `make clean && make` # ./python -m unittest -v test.test_tarfile Attached patch is on top of master's commit: commit 2aaf98c16ae3070378de523a173e29644037d8bd (HEAD -> master, origin/master, origin/HEAD) Author: INADA Naoki Date: Wed Sep 26 12:59:00 2018 +0900 -- Added file: https://bugs.python.org/file47826/safetarfile-4.diff ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
shashank added the comment: It won't exactly be drop-in replacement. I mean if users decide to replace Tarfile with SafeTarFile, existing code may break since there might be cases where dodgy tarballs are acceptable and/or used then SafeTarFile.open will throw an exception. Having said that, I am refactoring the tests right now since the test file is ~3000 lines and adding SafeTarFile tests for every TarFile test is cluttering it. -- ___ Python tracker <https://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5007] urllib2 HTTPS connection failure (BadStatusLine Exception)
Shashank added the comment: Works fine for me in 2.6 but fails as said by OP on 2.5. (I came across this in the course of my work and am submitting a change in a bug for the first time, pardon me if something is inappropriate :) I used this modified codeblock: -- import cookielib import urllib2 cookiejar = cookielib.LWPCookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar)) url = 'https://www.orange.sk/' req = urllib2.Request(url, None) s=opener.open(req) print s.read(); --- 2.6 gives a complete HTML page but 2.5 raises httplib.BadStatusLine exception. -- nosy: +shashank ___ Python tracker <http://bugs.python.org/issue5007> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26296] colorys rgb_to_hls algorithm error
Shashank Agarwal added the comment: I would like to work on this.. -- nosy: +The_Knight ___ Python tracker <http://bugs.python.org/issue26296> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com