Nadeem Vawda added the comment:
Sorry, I just haven't had any free time lately, and may still not be able
to give this the attention it deserves for another couple of weeks.
Serhiy, would you be interested in reviewing Nikolaus' patch?
--
Nadeem Vawda added the comment:
I've posted a review at http://bugs.python.org/review/15955/. (For some reason,
it looks like Rietveld didn't send out email notifications. But maybe it never
sends a notification to the sender? Hmm.)
--
Nadeem Vawda added the comment:
Thanks for the patch, Nikolaus. I'm afraid I haven't had a chance to look
over it yet; this past week has been a bit crazy for me. I'll definitely
get back to you with a review in the next week, though.
--
___
Nadeem Vawda added the comment:
> How does one create a multi-stream bzip2 file in the first place?
If you didn't do so deliberately, I would guess that you used a parallel
compression tool like pbzip2 or lbzip2 to create your bz2 file. These tools work
by splitting the input int
Nadeem Vawda added the comment:
As Serhiy said, multi-stream support was only added to the bz2 module in 3.3,
and there is no plan to backport functionality this to 2.7.
However, the bz2file package on PyPI [1] does support multi-stream inputs,
and you can use its BZ2File class as a drop-in
Nadeem Vawda added the comment:
After some consideration, I've come to agree with Serhiy that it would be better
to keep a private internal buffer, rather than having the user manage unconsumed
input data. I'm also in favor of having a flag to indicate whether the
decompressor needs
Nadeem Vawda added the comment:
The patch for zlib looks good to me. Thanks for working on this, Serhiy.
> We're not allowing changes in semantics for Argument Clinic conversion for
> 3.4. If it doesn't currently accept None, we can't add it right now, and
> we
Nadeem Vawda added the comment:
The latest patch for zlib seems to be missing Modules/zlibmodule.clinic.c
> I suppose that zdict=b'' have same effect as not specifying zdict. Am I right?
Probably, but to be on the safe side I'd prefer that we preserve the behav
Nadeem Vawda added the comment:
I can reproduce this (also on Ubuntu 13.10 64-bit). Maybe there's a bug
in the version of curses distributed with the latest Ubuntu release? It
looks like our only Ubuntu buildbot is using 8.04 (almost 6 years old!).
Also note that you won't be able to
Changes by Nadeem Vawda :
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.python.org/issue20358>
___
___
Python-bugs-list mailing list
Unsubscribe:
Nadeem Vawda added the comment:
The bz2 patch looks good to me, aside from a nit with the docstring for
BZ2Compressor.__init__.
The lzma patch produces a bunch of test failures for me. It looks like
the __init__ methods for LZMACompressor and LZMADecompressor aren't
accepting keyword
Nadeem Vawda added the comment:
No, I'm afraid I haven't had a chance to do any work on this issue since my last
message.
I would be happy to review a patch for this, but before you start writing one,
we should settle on how the API will look. I'll review the existing discussion
Nadeem Vawda added the comment:
The patches for bz2 and lzma look good to me, aside from one nit for lzma.
--
___
Python tracker
<http://bugs.python.org/issue20
Changes by Nadeem Vawda :
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.python.org/issue20182>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Nadeem Vawda :
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.python.org/issue20184>
___
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Nadeem Vawda :
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.python.org/issue20185>
___
___
Python-bugs-list mailing list
Unsubscribe:
Nadeem Vawda added the comment:
To clarify, which version(s) does this affect? I have not been able to
reproduce against 3.4, and 2.7 does not included the lzma module in the
first place.
--
___
Python tracker
<http://bugs.python.org/issue19
Nadeem Vawda added the comment:
It appears that this *does* affect 2.7 (though not 3.2, 3.3 or 3.4,
fortunately):
~/src/cpython/2.7☿ gdb --ex run --args ./python -c 'import bz2; obj =
bz2.BZ2File("/dev/null"); obj.__init__("")'
«... snip banner ...»
Nadeem Vawda added the comment:
I'll have a patch for this in the next couple of days (and a similar one
for the lzma module, which has the same issue (even though it's not a
regression in that case)).
In the meanwhile, you can work around this by feeding the compressed
Changes by Nadeem Vawda :
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.python.org/issue19227>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from Nadeem Vawda:
[Split off from issue 19395]
The following code hangs after hitting a TypeError trying to pickle one
of the TextIOWrapper objects:
import multiprocessing
def read(f): return f.read()
files = [open(path) for path in 3 * ['/dev/null']
Nadeem Vawda added the comment:
The part of this issue specific to LZMACompressor should now be fixed;
I've filed issue 19425 for the issue with Pool.map hanging.
--
resolution: -> fixed
stage: needs patch -> committed/rejected
status: ope
Nadeem Vawda added the comment:
It looks like there's also a separate problem in the multiprocessing
module. The following code hangs after hitting a TypeError trying to
pickle one of the TextIOWrapper objects:
import multiprocessing
def read(f): return f.read()
files = [open
Nadeem Vawda added the comment:
Yes, that's because the builtin map function doesn't handle each input
in a separate process, so it uses the same LZMACompressor object
everywhere. Whereas multiprocessing.Pool.map creates a new copy of the
compressor object for each input, which is
Nadeem Vawda added the comment:
As far as I can tell, liblzma provides no way to serialize a compressor's
state, so the best we can do is raise a TypeError when attempting to
pickle the LZMACompressor (and likewise for LZMADecompressor).
Also, it's worth pointing out that the pro
Nadeem Vawda added the comment:
[terry.reedy]
| Arfrever's point about the order of characters makes me wonder why mode
| strings (as opposed to characters in the strings) are being checked.
| The following tests that exactly one of w, a, x appear in mode.
| if len({'w', '
Nadeem Vawda added the comment:
Fix committed. Thanks for the patches!
As Jesús and Terry have said, this won't be backported to 3.3/2.7, since
it is a new feature.
[oylenshpeegul]
| It's weird how different these three patches are! We're
| essentially doing the same thing: &q
Changes by Nadeem Vawda :
--
assignee: -> nadeem.vawda
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python
Changes by Nadeem Vawda :
--
assignee: -> nadeem.vawda
nosy: +nadeem.vawda
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python
Nadeem Vawda added the comment:
No, that is the intended behavior for binary streams - they operate at
the level of individual byes. If you want to treat your input file as
Unicode-encoded text, you should open it in text mode. This will return a
TextIOWrapper which handles the decoding and line
Nadeem Vawda added the comment:
> I agree that making lzma.open() wrap its return value in a BufferedReader
> (or BufferedWriter, as appropriate) is the way to go.
On second thoughts, there's no need to change the behavior for mode='wb'.
We can just return a BufferedRead
Nadeem Vawda added the comment:
I agree that making lzma.open() wrap its return value in a BufferedReader
(or BufferedWriter, as appropriate) is the way to go. I'm currently
travelling and don't have my SSH key with me - Serhiy, can you make the
change?
I'll put together a docu
Nadeem Vawda added the comment:
Have you tried running the benchmark against the default (3.4) branch?
There was some significant optimization work done in issue 16034, but
the changes were not backported to 3.3.
--
___
Python tracker
<h
Nadeem Vawda added the comment:
Benjamin, please cherry-pick this for 2.7.4 as well (changesets b7bfedc8ee18
and 529c4defbfd7).
--
stage: needs patch -> commit review
versions: +Python 2.7
___
Python tracker
<http://bugs.python.org/issu
Nadeem Vawda added the comment:
OK, 2.7 is done.
Georg, what do we want to do for 3.2? I've attached a patch.
--
assignee: nadeem.vawda -> georg.brandl
keywords: +patch
Added file: http://bugs.python.org/file30049/bz2-viruswarning.diff
__
Nadeem Vawda added the comment:
Oh dear. I'll update the test suite over the weekend. In the meanwhile,
Christian, can you confirm which versions are affected? The file should only
have been included in 2.7 and 3.2.
--
assignee: -> nade
Nadeem Vawda added the comment:
Hmm, so actually most of the bugs fixed in 2.7 and 3.2 weren't present
in 3.3 and 3.4, and those versions already had tests equivalent to the
tests I added for 2.7/3.2.
As for the changes that I did make to 3.3/3.4:
- two of the three cover cases that only
Nadeem Vawda added the comment:
An oversight on my part, I think. I'll add tests for 3.x this weekend.
--
status: closed -> open
___
Python tracker
<http://bugs.python.org
Nadeem Vawda added the comment:
> You could add a comment explaining the issue.
Done.
This doesn't seem to affect 2.7. Marking as fixed in 3.2/3.3/3.4.
--
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
versi
Nadeem Vawda added the comment:
This change fixes the problem (and doesn't break anything else that I can see):
--- a/Lib/test/test_ssl.py
+++ b/Lib/test/test_ssl.py
@@ -979,7 +979,7 @@
self.sslconn = self.server.context.wrap_socket(
self
Nadeem Vawda added the comment:
I think the new behavior should be controlled by a constructor flag, maybe
named "defer_errors". I don't like the idea of adding the flag to read(),
since that makes us diverge from the standard file interface. Making a
distinction between size&
Nadeem Vawda added the comment:
You're right; it breaks backspacing over multibyte characters. I should
have tested it more carefully before committing. I'll revert the changes.
--
resolution: fixed ->
stage: committed/rejected -> needs patch
status
Changes by Nadeem Vawda :
--
assignee: -> nadeem.vawda
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
versions: +Python 2.7
___
Python tracker
<http://bugs.python
Nadeem Vawda added the comment:
The updated patch looks good to me.
--
___
Python tracker
<http://bugs.python.org/issue1159051>
___
___
Python-bugs-list mailin
Nadeem Vawda added the comment:
> What if unconsumed_tail is not empty but less than needed to decompress at
> least one byte? We need read more data until unconsumed_tail grow enought to
> be decompressed.
This is possible in zlib, but not in bz2. According to the manual [1], it is
Nadeem Vawda added the comment:
I've reviewed the patch and posted some comments on Rietveld.
> I doubt about backward compatibility. It's obvious that struct.error and
> TypeError are unintentional, and EOFError is purposed for this case. However
> users can catch undocu
Changes by Nadeem Vawda :
--
resolution: -> duplicate
stage: -> committed/rejected
status: open -> closed
superseder: -> seriously? urllib still doesn't support persistent connections?
___
Python tracker
<http://bugs.p
Nadeem Vawda added the comment:
Fixed. Thanks for the bug report and the patches!
--
assignee: -> nadeem.vawda
keywords: +3.3regression -patch
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Py
Nadeem Vawda added the comment:
>> # Using zlib's interface
>> while not d.eof:
>> compressed = d.unconsumed_tail or f.read(8192)
>> if not compressed:
>> raise ValueError('End-of-stream marker not found')
>
Nadeem Vawda added the comment:
I've tried reimplementing LZMAFile in terms of the decompress_into()
method, and it has ended up not being any faster than the existing
implementation. (It is _slightly_ faster for readinto() with a large
buffer size, but all other cases it was either of
Nadeem Vawda added the comment:
Committed. Thanks for the patch!
--
resolution: -> fixed
stage: commit review -> committed/rejected
status: open -> closed
type: -> enhancement
___
Python tracker
<http://bugs.python
Nadeem Vawda added the comment:
Ah, that's much nicer than either of my ideas. Patch committed. Thanks!
--
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.pyth
Nadeem Vawda added the comment:
New patch committed. Once again, thanks for all your work on this issue!
--
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.org/i
Nadeem Vawda added the comment:
Looks good to me. Go ahead.
You needn't add or change any tests for this, but you should run the
existing tests before committing, just to be safe.
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.py
Nadeem Vawda added the comment:
> These were not idle questions. I wrote the patch, and I had to know
> what behavior is correct.
Ah, sorry. I assumed you were going to submit a separate patch to fix the
unconsumed_tail issues.
> Here's the patch. It fixes potent
Nadeem Vawda added the comment:
I suspect that it will be slower than the decompress_into() approach, but
as you say, we need to do benchmarks to see for sure.
--
___
Python tracker
<http://bugs.python.org/issue15
Nadeem Vawda added the comment:
I agree that being able to limit output size is useful and desirable, but
I'm not keen on copying the max_length/unconsumed_tail approach used by
zlib's decompressor class. It feels awkward to use, and it complicates
the implementation of the existing
Nadeem Vawda added the comment:
> flush() does not update unconsumed_tail and unused_data.
>
> >>> import zlib
> >>> x = zlib.compress(b'abcdefghijklmnopqrstuvwxyz') + b'0123456789'
> >>> dco = zlib.decompressobj()
> >>> d
New submission from Nadeem Vawda:
When calling zlib.Decompress.decompress() with a max_length argument,
if the input data is not full consumed, the next_in pointer in the
z_stream struct are left pointing into the data object, but the
decompressor does not hold a reference to this object. This
Nadeem Vawda added the comment:
Fixed. Thanks for the patch!
> This hacking is not needed, if first argument of PyBytes_FromStringAndSize()
> is NULL, the contents of the bytes object are uninitialized.
Oh, cool. I didn't know about that.
> What should unconsumed_tail be equa
Nadeem Vawda added the comment:
Interesting idea, but I'm not sure it would be worth the effort. It would
make the code and API more complicated, so it wouldn't really help users,
and would be an added maintenance burden.
--
___
Python trac
Nadeem Vawda added the comment:
This bug (zlib not providing a way to detect end-of-stream) has already
been fixed - see issue 12646.
I've opened issue 16350 for the unused_data problem.
--
resolution: -> out of date
stage: test needed -> committed/rejected
status: ope
New submission from Nadeem Vawda:
>From issue 5210:
amaury.forgeotdarc wrote:
> Hm, I tried a modified version of your first test, and I found another
> problem with the current zlib library;
> starting with the input:
> x = x1 + x2 + HAMLET_SCENE# both compressed and un
Changes by Nadeem Vawda :
--
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
All fixed, along with some other similar but harder-to-trigger bugs.
Thanks for the bug report, Laurent!
--
resolution: -> fixed
stage: needs patch -> committed/rejected
status: open -> closed
___
Pytho
Nadeem Vawda added the comment:
I'm working on it now. Will push in the next 15 minutes or so.
--
___
Python tracker
<http://bugs.python.org/issue14398>
___
___
Nadeem Vawda added the comment:
The data corruption issue is now fixed in the 2.7 branch.
In 3.x, using a mode containing 'U' results in an exception rather than silent
data corruption. Additionally, gzip.open() has supported text modes
("rt"/"wt"/"at&
Changes by Nadeem Vawda :
--
resolution: -> fixed
stage: needs patch -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
Hmm, OK. URLopener and FancyURLopener do each issue a DeprecationWarning when
used, though. If they are not actually deprecated, perhaps we should remove the
warnings for the moment?
--
___
Python tracker
<h
Nadeem Vawda added the comment:
Are we still planning on removing URLopener and FancyURLopener in 3.4? The
documentation for 3.3 does not list these classes as deprecated.
--
___
Python tracker
<http://bugs.python.org/issue10
Changes by Nadeem Vawda :
--
resolution: -> rejected
stage: -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
No sign of these failures any more; looks like that fixed it.
--
resolution: -> fixed
stage: needs patch -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Changes by Nadeem Vawda :
--
resolution: -> works for me
stage: needs patch -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
I've released v0.95 of bz2file, which incorporates all the optimizations
discussed here. The performance should be similar to 2.x's bz2 in most cases.
It is still a lot slower when calling read(10) or read(1), but I hope no-one is
doing that anyw
Nadeem Vawda added the comment:
Ah, nice - I didn't think of that optimization. Neater and faster.
I've committed this patch [e6d872b61c57], along with a minor bugfix
[7252f9f95fe6], and another optimization for readline()/readlines()
[6d7bf512e0c3]. [merge with default: a19f47d
Nadeem Vawda added the comment:
> Recursive inline _check_can_read() will be enough. Now this check calls 4
> Python functions (_check_can_read(), readable(), _check_non_closed(),
> closed). Recursive inlining only readable() in _check_can_read() is achieved
> significant but le
Nadeem Vawda added the comment:
> Yes, of course.
Awesome. I plan to do a new release for this in the next couple of days.
> We can even speed up 1.5 times the reading of small chunks, if we inline
> _check_can_read() and _read_block().
Interesting idea, but I don't think i
Nadeem Vawda added the comment:
Thanks for the bug report, Victor, and thank you Serhiy for the patch!
Serhiy, would you be OK with me also including this patch in the bz2file
package?
--
resolution: -> fixed
stage: -> committed/rejected
status: open -> closed
versions: +P
Nadeem Vawda added the comment:
As far as I can tell, there is no way to find this out reliably without
decompressing the entire file. With gzip, the file trailer contains the
uncompressed size modulo 2^32, but this seems less than useful. It appears that
the other two formats do not store
Changes by Nadeem Vawda :
--
stage: -> needs patch
___
Python tracker
<http://bugs.python.org/issue12669>
___
___
Python-bugs-list mailing list
Unsubscri
Nadeem Vawda added the comment:
> Nadeem: is the failure you show in msg141798 with a version of test_curses
> that uses pty.openpty?
Yes, I tried the following change:
--- a/Lib/test/test_curses.py
+++ b/Lib/test/test_curses.py
@@ -328,11 +328,12 @@
curses.r
Changes by Nadeem Vawda :
--
superseder: -> test_curses skipped on buildbots
___
Python tracker
<http://bugs.python.org/issue15664>
___
___
Python-bugs-lis
Nadeem Vawda added the comment:
Thanks for the patch. Unfortunately I don't have much free time at the
moment, so it might be a few weeks before I get a chance to review it.
--
___
Python tracker
<http://bugs.python.org/is
Changes by Nadeem Vawda :
--
nosy: +nadeem.vawda
___
Python tracker
<http://bugs.python.org/issue15654>
___
___
Python-bugs-list mailing list
Unsubscribe:
Nadeem Vawda added the comment:
No, if _read() is called once the file is already at EOF, it raises an
EOFError (http://hg.python.org/cpython/file/8c07ff7f882f/Lib/gzip.py#l433),
which will then break out of the loop.
--
___
Python tracker
<h
Nadeem Vawda added the comment:
Before these fixes, it looks like all three classes' peek() methods were
susceptible
to the same problem as read1().
The fixes for BZ2File.read1() and LZMAFile.read1() should have fixed peek() as
well;
both methods are implemented in terms of _fill_b
Nadeem Vawda added the comment:
Done.
Thanks for the bug report, David.
--
resolution: -> fixed
stage: -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
OK, BZ2File should now be fixed. It looks like LZMAFile and GzipFile may
be susceptible to the same problem; I'll push fixes for them shortly.
--
___
Python tracker
<http://bugs.python.org/is
Nadeem Vawda added the comment:
The cause of this problem is that BZ2File.read1() sometimes returns b"", even
though
the file is not at EOF. This happens when the underlying BZ2Decompressor cannot
produce
any decompressed data from just the block passed to it in _fill_buffer(); in
Nadeem Vawda added the comment:
I can't seem to reproduce this with an up-to-date checkout from Mercurial:
>>> import bz2
>>> g = bz2.open('access-log-0108.bz2','rt')
>>> next(g)
'140.180.132.213 - - [24/Feb/2008:00:08:59
Changes by Nadeem Vawda :
--
resolution: -> invalid
stage: -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
+1 for the general idea of deprecating and eventually removing the "U"
modes.
But I agree with David, that it doesn't make sense to have separate steps
for 3.5 and 3.6/4.0. If you make the code raise an exception when "U" is
used, how is
Nadeem Vawda added the comment:
Merging nosy list from duplicate issue 15155.
--
nosy: +giampaolo.rodola, neologix, pitrou
___
Python tracker
<http://bugs.python.org/issue13
Nadeem Vawda added the comment:
I already fixed this without knowing about this issue; see 55202ca694d7.
storchaka:
> Why not use io.TextWrapper? I think it is the right answer for this issue.
The proposed patch (and the code I committed) *do* use TextIOWrapper.
Unless you mean that call
Nadeem Vawda added the comment:
Patch looks fine to me.
Antoine, can you commit this? I'm currently away from the computer that
has my SSH key on it.
--
___
Python tracker
<http://bugs.python.org/is
Nadeem Vawda added the comment:
> Just saw this on the checkins list; where are the other options documented?
They aren't, AFAIK. I've been planning on adding them when I've got time
(based on the zlib manual at http://zlib.net/manual.html), but with the
upcoming feature fr
Nadeem Vawda added the comment:
Committed. Once again, thanks for the patch!
--
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.or
Nadeem Vawda added the comment:
I plan to commit it (along with the buffer API changes) tomorrow.
--
___
Python tracker
<http://bugs.python.org/issue14
Nadeem Vawda added the comment:
> To restate my position: the need is for an immutable string of bytes, [...]
I disagree that we should require the dictionary to be immutable - if the
caller wishes to use a mutable buffer here, it is their responsibility to
ensure that it is not modified un
Nadeem Vawda added the comment:
There is already such a function, gzip.decompress() - it was added in 3.2.
--
nosy: +nadeem.vawda
resolution: -> invalid
stage: -> committed/rejected
status: open -> pending
___
Python track
1 - 100 of 535 matches
Mail list logo