Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
Antoine Pitrou schrieb am 25.05.2018 um 23:11: > On Fri, 25 May 2018 14:50:57 -0600 > Neil Schemenauer wrote: >> On 2018-05-25, Antoine Pitrou wrote: >>> Do you have something specific in mind? >> >> I think compressed by default is a good idea. My quick proposal: >> >> - Use fast compression like lz4 or zlib with Z_BEST_SPEED >> >> - Add a 'compress' keyword argument with a default of None. For >> protocol 5, None means to compress. Providing 'compress' != None >> for older protocols will raise an error. > > The question is what purpose does it serve for pickle to do it rather > than for the user to compress the pickle themselves. You're basically > saving one line of code. Am I missing some other advantage? Regarding the pickling side, if the pickle is large, then it can save memory to compress while pickling, rather than compressing after pickling. But that can also be done with file-like objects, so the advantage is small here. I think a major advantage is on the unpickling side rather than the pickling side. Sure, users can compress a pickle after the fact, but if there's a (set of) standard algorithms that unpickle can handle automatically, then it's enough to pass "something pickled" into unpickle, rather than having to know (or figure out) if and how that pickle was originally compressed, and build up the decompression pipeline for it to get everything uncompressed efficiently without accidentally wasting memory or processing time. Obviously, auto-decompression opens up a gate for compression bombs, but then, unpickling data from untrusted sources is discouraged anyway, so... Stefan ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add __reversed__ methods for dict
> Concerns have been raised in the comments that this feature may add too much > bloat in the core interpreter and be harmful for other Python implementations. To clarify, my point is it prohibit hashmap + single linked list implementation in other Python implementation. Because doubly linked list is very memory inefficient, every implementation would be forced to implement dict like PyPy (and CPython) for efficiency. But I don't know much about current MicroPython and other Python implementation's plan to catch Python 3.6 up. > Given the different issues this change creates, I see three possibilities: > 1. Accept the proposal has it is for dict and dict views, this would add about > 300 lines and three new types in dictobject.c > 2. Accept the proposal only for dict, this would add about 80 lines and one > new type in dictobject.c while still being useful for some use cases > 3. Drop the proposal as the whole, while having some use, reversed(dict(a=1, b=2)) > may not be very common and could be done using OrderedDict instead. > What’s your stance on the issue ? I want to wait one version (3.8) for other implementations. "Keep insertion order" is requirement from 3.7 which is not released yet. I feel it's too early to add more stronger requirements to core type. Regards, --- INADA Naoki ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
Hi all, I agree that compression is often a good idea when moving serialized objects around on a network, but for what it's worth I as a library author would always set compress=False and then handle it myself as a separate step. There are a few reasons for this: 1. Bandwidth is often pretty good, especially intra-node, on high performance networks, or on decent modern discs (NVMe) 2. I often use different compression technologies in different situations. LZ4 is a great all-around default, but often snappy, blosc, or z-standrad are better suited. This depends strongly on the characteristics of the data. 3. Very often data often isn't compressible, or is already in some compressed form, such as in images, and so compressing only hurts you. In general, my thought is that compression is a complex topic with enough intricaces that setting a single sane default that works 70+% of the time probably isn't possible (at least not with the applications that I get exposed to). Instead of baking a particular method into pickle.dumps I would recommend trying to solve this problem through documentation, pointing users to the various compression libraries within the broader Python ecosystem, and perhaps pointing to one of the many blogposts that discuss their strengths and weaknesses. Best, -matt ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available
+1 for not adding in-pickle compression as it is already very easy to handle compression externally (for instance by passing a compressing file object as an argument to the pickler). Furthermore, as PEP 574 makes it possible to stream the buffer bytes directly to the file-object without any temporary memory copy I don't see any benefit in including the compression into the pickle protocol. However adding lz4.LZ4File to the standard library in addition to gzip.GzipFile and lzma.LZMAFile is probably a good idea as LZ4 is really fast compared to zlib/gzip. But this is not related to PEP 574. -- Olivier ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add __reversed__ methods for dict
Hm, I find Inada's argument compelling that this might not be easy for all implementations. So let's wait. On Sat, May 26, 2018 at 7:20 AM, INADA Naoki wrote: > > Concerns have been raised in the comments that this feature may add too > much > > bloat in the core interpreter and be harmful for other Python > implementations. > > > To clarify, my point is it prohibit hashmap + single linked list > implementation in > other Python implementation. > Because doubly linked list is very memory inefficient, every implementation > would be forced to implement dict like PyPy (and CPython) for efficiency. > > But I don't know much about current MicroPython and other Python > implementation's > plan to catch Python 3.6 up. > > > Given the different issues this change creates, I see three > possibilities: > > > 1. Accept the proposal has it is for dict and dict views, this would add > about > > 300 lines and three new types in dictobject.c > > > 2. Accept the proposal only for dict, this would add about 80 lines and > one > > new type in dictobject.c while still being useful for some use cases > > > 3. Drop the proposal as the whole, while having some use, > reversed(dict(a=1, b=2)) > > may not be very common and could be done using OrderedDict instead. > > > What’s your stance on the issue ? > > > I want to wait one version (3.8) for other implementations. > "Keep insertion order" is requirement from 3.7 which is not released yet. > I feel it's too early to add more stronger requirements to core type. > > Regards, > > --- > INADA Naoki > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] lz4 compression
On Sat, 26 May 2018 18:42:42 +0200 Olivier Grisel wrote: > > However adding lz4.LZ4File to the standard library in addition to > gzip.GzipFile and lzma.LZMAFile is probably a good idea as LZ4 is really > fast compared to zlib/gzip. But this is not related to PEP 574. If we go that way, we may probably want zstd as well :-). But, yes, most likely unrelated to PEP 574. Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add __reversed__ methods for dict
> On May 26, 2018, at 7:20 AM, INADA Naoki wrote: > > Because doubly linked list is very memory inefficient, every implementation > would be forced to implement dict like PyPy (and CPython) for efficiency. > But I don't know much about current MicroPython and other Python > implementation's > plan to catch Python 3.6 up. FWIW, Python 3.7 is the first Python that where the language guarantees that regular dicts are order preserving. And the feature being discussed in this thread is for Python 3.8. What potential implementation obstacles do you foresee? Can you imagine any possible way that an implementation would have an order preserving dict but would be unable to trivially implement __reversed__? How could an implementation have a __setitem__ that appends at the end, and a popitem() that pops from that same end, but still not be able to easily iterate in reverse? It really doesn't matter whether an implementer uses a dense array of keys or a doubly-linked-list; either way, looping backward is as easy as going forward. Raymond P.S. It isn't going to be hard to update MicroPython to have a compact and ordered dict (based on my review of their existing dict implementation). This is something they are really going to want because of the improved memory efficiency. Also, they're also already going to need it just to comply with guaranteed keyword argument ordering and guaranteed ordering of class dictionaries. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
