[issue22003] BytesIO copy-on-write

2015-03-04 Thread David Wilson
David Wilson added the comment: Hi Piotr, There wasn't an obvious fix that didn't involve changing the buffer interface itself. There is presently ambiguity in the interface regarding the difference between a "read only" buffer and an "immutable" buffer, which is crucial for its use in this c

[issue22003] BytesIO copy-on-write

2015-03-04 Thread Piotr Dobrogost
Piotr Dobrogost added the comment: > This new patch abandons the buffer interface and specializes for Bytes per > the comments on this issue. Why does it abandon buffer interface? Because of the following? > Thanks for digging here. As much as I'd love to follow this interpretation, > it simp

[issue22003] BytesIO copy-on-write

2015-02-14 Thread Roundup Robot
Roundup Robot added the comment: New changeset 7ae156f07a90 by Berker Peksag in branch 'default': Add a whatsnew entry for issue #22003. https://hg.python.org/cpython/rev/7ae156f07a90 -- ___ Python tracker

[issue22003] BytesIO copy-on-write

2015-02-09 Thread David Wilson
David Wilson added the comment: Attached trivial patch for whatsnew.rst. -- Added file: http://bugs.python.org/file38058/whatsnew.diff ___ Python tracker ___

[issue22003] BytesIO copy-on-write

2015-02-09 Thread Mikhail Korobov
Mikhail Korobov added the comment: Shouldn't this fix be mentioned in https://docs.python.org/3.5/whatsnew/3.5.html#optimizations ? -- ___ Python tracker ___ ___

[issue22003] BytesIO copy-on-write

2014-07-29 Thread Antoine Pitrou
Antoine Pitrou added the comment: The latest patch is good indeed. Thank you very much! -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue22003] BytesIO copy-on-write

2014-07-29 Thread Roundup Robot
Roundup Robot added the comment: New changeset 79a5fbe2c78f by Antoine Pitrou in branch 'default': Issue #22003: When initialized from a bytes object, io.BytesIO() now http://hg.python.org/cpython/rev/79a5fbe2c78f -- nosy: +python-dev ___ Python track

[issue22003] BytesIO copy-on-write

2014-07-29 Thread David Wilson
David Wilson added the comment: I suspect it's all covered now, but is there anything else I can help with to get this patch pushed along its merry way? -- ___ Python tracker __

[issue22003] BytesIO copy-on-write

2014-07-28 Thread Stefan Krah
Stefan Krah added the comment: So I wonder why the benchmark suite says that the telco slowdown is significant. :) -- ___ Python tracker ___

[issue22003] BytesIO copy-on-write

2014-07-28 Thread Stefan Krah
Stefan Krah added the comment: > Just curious, what causes e.g. telco to differ up to 7% between runs? That's > really huge. telco.py always varies a lot between runs (up to 10%), even in the big version "telco.py full": http://bytereef.org/mpdecimal/quickstart.html#telco-benchmark Using the

[issue22003] BytesIO copy-on-write

2014-07-28 Thread David Wilson
David Wilson added the comment: Newest patch incorporates Antoine's review comments. The final benchmark results are below. Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge Report on Linux k2 3.14-1-amd64 #1 SMP Debian 3.14.9-1 (2014-06-30) x86_64 Total

[issue22003] BytesIO copy-on-write

2014-07-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Even after totally isolating a CPU for e.g. django_v2 and with frequency > scaling disabled, numbers still jump around for the same binary by as much as > 3%. That's expected. If the difference doesn't go above 5-10%, then you IMO can pretty much consider y

[issue22003] BytesIO copy-on-write

2014-07-27 Thread David Wilson
David Wilson added the comment: Hey Antoine, Thanks for the link. I'm having trouble getting reproducible results at present, and running out of ideas as to what might be causing it. Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump

[issue22003] BytesIO copy-on-write

2014-07-25 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- stage: needs patch -> patch review ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe

[issue22003] BytesIO copy-on-write

2014-07-25 Thread Antoine Pitrou
Antoine Pitrou added the comment: > It doesn't seem likely this patch would introduce severe performance troubles > elsewhere, but I'd like to trying it out with some example heavy BytesIO > consumers (any suggestions? Some popular template engine?) I don't have any specific suggestions, but y

[issue22003] BytesIO copy-on-write

2014-07-24 Thread David Wilson
David Wilson added the comment: This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue. Anyone care to glance at least at the general structure? Tests could probably use a little more work. Microbenchmark seems fine, at least for construction. It

[issue22003] BytesIO copy-on-write

2014-07-22 Thread Antoine Pitrou
Antoine Pitrou added the comment: I don't like the idea of trying to hash the object. It may be a time-consuming operation, while the result will be thrown away. I think restricting the optimization to bytes objects is fine. We can whitelist other types, such as memoryview. -- __

[issue22003] BytesIO copy-on-write

2014-07-22 Thread David Wilson
David Wilson added the comment: Stefan, I like your new idea. If there isn't some backwards compatibility argument about mmap.mmap being hashable, then it could be considered a bug, and fixed in the same hypothetical future release that includes this BytesIO change. The only cost now is that t

[issue22003] BytesIO copy-on-write

2014-07-22 Thread Stefan Krah
Stefan Krah added the comment: I think the mmap behavior is probably worse than the NumPy example. I assume that in the example the exporter sets view.readonly=0. mmap objects set view.readonly=1 and can still be mutated. -- ___ Python tracker

[issue22003] BytesIO copy-on-write

2014-07-22 Thread Stefan Krah
Stefan Krah added the comment: Actually we have an extra safety net in memory_hash() apart from the readonly check: We also check if the underlying object is hashable. This might be applicable here, too. Unfortunately mmap objects *are* hashable, leading to some funny results: >>> import mmap

[issue22003] BytesIO copy-on-write

2014-07-22 Thread Antoine Pitrou
Antoine Pitrou added the comment: There's also the following code in numpy's getbuffer method: /* * If a read-only buffer is requested on a read-write array, we return a * read-write buffer, which is dubious behavior. But that's why this call * is guarded by PyArray_ISWRITEABL

[issue22003] BytesIO copy-on-write

2014-07-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See also issue15381. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: htt

[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson
Changes by David Wilson : Added file: http://bugs.python.org/file36016/cow4.patch ___ Python tracker ___ ___ Python-bugs-list mailing list Uns

[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson
David Wilson added the comment: Hi Stefan, How does this approach in reinit() look? We first ask for a writable buffer, and if the object obliges, immediately copy it. Otherwise if it refused, ask for a read-only buffer, and this time expect that it will never change. This still does not catc

[issue22003] BytesIO copy-on-write

2014-07-21 Thread Stefan Krah
Stefan Krah added the comment: I'm sure many exporters aren't setting the right flags; on the other hand we already hash memoryviews based on readonly buffers, assuming they are immutable. -- ___ Python tracker ___

[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson
David Wilson added the comment: I'm not sure how much work it would be, or even if it could be made sufficient to solve our problem, but what about extending the buffers interface to include a "int stable" flag, defaulting to 0? It seems though, that it would just be making the already arcane

[issue22003] BytesIO copy-on-write

2014-07-21 Thread David Wilson
David Wilson added the comment: Stefan, Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library. For example, mmap.mmap(..., flags=mmap.MAP_SHARED, prot=mmap.PROT_READ) will pro

[issue22003] BytesIO copy-on-write

2014-07-21 Thread Stefan Krah
Stefan Krah added the comment: The original wording in the PEP is this: readonly an integer variable to hold whether or not the memory is readonly. 1 means the memory is readonly, zero means the memory is writable. To me this means that a hypothetical compiler that could figur

[issue22003] BytesIO copy-on-write

2014-07-21 Thread Stefan Krah
Stefan Krah added the comment: I think checking for a readonly view is fine. The protocol is this: 1) Use the PyBUF_WRITABLE flag in the request. Then the provider must either have a writable buffer or else deny the request entirely. 2) Omit the PyBUF_WRITABLE flag in the request. Th

[issue22003] BytesIO copy-on-write

2014-07-21 Thread Antoine Pitrou
Antoine Pitrou added the comment: As for whether the "checking for a readonly view" approach is broken, I don't know: that part of the buffer API is still mysterious to me. Stefan, would you have some insight? -- nosy: +skrah ___ Python tracker

[issue22003] BytesIO copy-on-write

2014-07-21 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Pretty sure this approach is broken. What about the alternative approach of > specializing for Bytes? That would certainly sound good enough, to optimize the common case. Also, it would be nice if you could add some tests to the patch (e.g. to stress the by

[issue22003] BytesIO copy-on-write

2014-07-20 Thread David Wilson
David Wilson added the comment: I'm not sure the "read only buffer" test is strong enough: having a readonly view is not a guarantee that the data in the view cannot be changed through some other means, i.e. it is read-only, not immutable. Pretty sure this approach is broken. What about the al

[issue22003] BytesIO copy-on-write

2014-07-20 Thread David Wilson
David Wilson added the comment: New patch also calls unshare() during getbuffer() -- Added file: http://bugs.python.org/file36005/cow3.patch ___ Python tracker ___ __

[issue22003] BytesIO copy-on-write

2014-07-20 Thread David Wilson
David Wilson added the comment: This version is tidied up enough that I think it could be reviewed. Changes are: * Defer `buf' allocation until __init__, rather than __new__ as was previously done. Now upon completion, BytesIO.__new__ returns a valid, closed BytesIO, whereas previously a val

[issue22003] BytesIO copy-on-write

2014-07-18 Thread Mikhail Korobov
Changes by Mikhail Korobov : -- nosy: +kmike ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pytho

[issue22003] BytesIO copy-on-write

2014-07-18 Thread Stefan Behnel
Stefan Behnel added the comment: Even if there is no way to explicitly request a RO buffer, the Py_buffer struct that you get back actually tells you if it's read-only or not. Shouldn't that be enough to enable this optimisation? Whether or not implementors of the buffer protocol set this flag

[issue22003] BytesIO copy-on-write

2014-07-18 Thread David Wilson
David Wilson added the comment: Good catch :( There doesn't seem to be way a to ask for an immutable buffer, so perhaps it could just be a little more selective. I think the majority of use cases would still be covered if the sharing behaviour was restricted only to BytesType. In that case "P

[issue22003] BytesIO copy-on-write

2014-07-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Be careful what happens when the original object is mutable: >>> b = bytearray(b"abc") >>> bio = io.BytesIO(b) >>> b[:] = b"defghi" >>> bio.getvalue() b'abc' I don't know what your patch does in this case. -- nosy: +serhiy.storchaka stage: -> needs pa

[issue22003] BytesIO copy-on-write

2014-07-17 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +benjamin.peterson, hynek, pitrou, stutzbach ___ Python tracker ___ ___ Python-bugs-list mailing

[issue22003] BytesIO copy-on-write

2014-07-17 Thread David Wilson
David Wilson added the comment: Submitted contributor agreement. Please consider the demo patch licensed under the Apache 2 licence. -- ___ Python tracker ___ __

[issue22003] BytesIO copy-on-write

2014-07-17 Thread David Wilson
New submission from David Wilson: This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI. The goal of posting the patc