Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available

2018-05-26 Thread Stefan Behnel
Antoine Pitrou schrieb am 25.05.2018 um 23:11:
> On Fri, 25 May 2018 14:50:57 -0600
> Neil Schemenauer wrote:
>> On 2018-05-25, Antoine Pitrou wrote:
>>> Do you have something specific in mind?  
>>
>> I think compressed by default is a good idea.  My quick proposal:
>>
>> - Use fast compression like lz4 or zlib with Z_BEST_SPEED
>>
>> - Add a 'compress' keyword argument with a default of None.  For
>>   protocol 5, None means to compress.  Providing 'compress' != None
>>   for older protocols will raise an error.
> 
> The question is what purpose does it serve for pickle to do it rather
> than for the user to compress the pickle themselves.  You're basically
> saving one line of code.  Am I missing some other advantage?

Regarding the pickling side, if the pickle is large, then it can save
memory to compress while pickling, rather than compressing after pickling.
But that can also be done with file-like objects, so the advantage is small
here.

I think a major advantage is on the unpickling side rather than the
pickling side. Sure, users can compress a pickle after the fact, but if
there's a (set of) standard algorithms that unpickle can handle
automatically, then it's enough to pass "something pickled" into unpickle,
rather than having to know (or figure out) if and how that pickle was
originally compressed, and build up the decompression pipeline for it to
get everything uncompressed efficiently without accidentally wasting memory
or processing time.

Obviously, auto-decompression opens up a gate for compression bombs, but
then, unpickling data from untrusted sources is discouraged anyway, so...

Stefan

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add __reversed__ methods for dict

2018-05-26 Thread INADA Naoki
> Concerns have been raised in the comments that this feature may add too
much
> bloat in the core interpreter and be harmful for other Python
implementations.


To clarify, my point is it prohibit hashmap + single linked list
implementation in
other Python implementation.
Because doubly linked list is very memory inefficient, every implementation
would be forced to implement dict like PyPy (and CPython) for efficiency.

But I don't know much about current MicroPython and other Python
implementation's
plan to catch Python 3.6 up.

> Given the different issues this change creates, I see three possibilities:

> 1. Accept the proposal has it is for dict and dict views, this would add
about
> 300 lines and three new types in dictobject.c

> 2. Accept the proposal only for dict, this would add about 80 lines and
one
> new type in dictobject.c while still being useful for some use cases

> 3. Drop the proposal as the whole, while having some use,
reversed(dict(a=1, b=2))
> may not be very common and could be done using OrderedDict instead.

> What’s your stance on the issue ?


I want to wait one version (3.8) for other implementations.
"Keep insertion order" is requirement from 3.7 which is not released yet.
I feel it's too early to add more stronger requirements to core type.

Regards,

---
INADA Naoki  
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available

2018-05-26 Thread Matthew Rocklin
Hi all,

I agree that compression is often a good idea when moving serialized
objects around on a network, but for what it's worth I as a library author
would always set compress=False and then handle it myself as a separate
step.  There are a few reasons for this:

   1. Bandwidth is often pretty good, especially intra-node, on high
   performance networks, or on decent modern discs (NVMe)
   2. I often use different compression technologies in different
   situations.  LZ4 is a great all-around default, but often snappy, blosc, or
   z-standrad are better suited.  This depends strongly on the characteristics
   of the data.
   3. Very often data often isn't compressible, or is already in some
   compressed form, such as in images, and so compressing only hurts you.

In general, my thought is that compression is a complex topic with enough
intricaces that setting a single sane default that works 70+% of the time
probably isn't possible (at least not with the applications that I get
exposed to).

Instead of baking a particular method into pickle.dumps I would recommend
trying to solve this problem through documentation, pointing users to the
various compression libraries within the broader Python ecosystem, and
perhaps pointing to one of the many blogposts that discuss their strengths
and weaknesses.

Best,
-matt
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 574 (pickle 5) implementation and backport available

2018-05-26 Thread Olivier Grisel
+1 for not adding in-pickle compression as it is already very easy to
handle compression externally (for instance by passing a compressing file
object as an argument to the pickler). Furthermore, as PEP 574 makes it
possible to stream the buffer bytes directly to the file-object without any
temporary memory copy I don't see any benefit in including the compression
into the pickle protocol.

However adding lz4.LZ4File to the standard library in addition to
gzip.GzipFile and lzma.LZMAFile is probably a good idea as LZ4 is really
fast compared to zlib/gzip. But this is not related to PEP 574.

-- 
Olivier
​
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add __reversed__ methods for dict

2018-05-26 Thread Guido van Rossum
Hm, I find Inada's argument compelling that this might not be easy for all
implementations. So let's wait.

On Sat, May 26, 2018 at 7:20 AM, INADA Naoki  wrote:

> > Concerns have been raised in the comments that this feature may add too
> much
> > bloat in the core interpreter and be harmful for other Python
> implementations.
>
>
> To clarify, my point is it prohibit hashmap + single linked list
> implementation in
> other Python implementation.
> Because doubly linked list is very memory inefficient, every implementation
> would be forced to implement dict like PyPy (and CPython) for efficiency.
>
> But I don't know much about current MicroPython and other Python
> implementation's
> plan to catch Python 3.6 up.
>
> > Given the different issues this change creates, I see three
> possibilities:
>
> > 1. Accept the proposal has it is for dict and dict views, this would add
> about
> > 300 lines and three new types in dictobject.c
>
> > 2. Accept the proposal only for dict, this would add about 80 lines and
> one
> > new type in dictobject.c while still being useful for some use cases
>
> > 3. Drop the proposal as the whole, while having some use,
> reversed(dict(a=1, b=2))
> > may not be very common and could be done using OrderedDict instead.
>
> > What’s your stance on the issue ?
>
>
> I want to wait one version (3.8) for other implementations.
> "Keep insertion order" is requirement from 3.7 which is not released yet.
> I feel it's too early to add more stronger requirements to core type.
>
> Regards,
>
> ---
> INADA Naoki  
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] lz4 compression

2018-05-26 Thread Antoine Pitrou
On Sat, 26 May 2018 18:42:42 +0200
Olivier Grisel  wrote:
> 
> However adding lz4.LZ4File to the standard library in addition to
> gzip.GzipFile and lzma.LZMAFile is probably a good idea as LZ4 is really
> fast compared to zlib/gzip. But this is not related to PEP 574.

If we go that way, we may probably want zstd as well :-). But, yes,
most likely unrelated to PEP 574.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add __reversed__ methods for dict

2018-05-26 Thread Raymond Hettinger

> On May 26, 2018, at 7:20 AM, INADA Naoki  wrote:
> 
> Because doubly linked list is very memory inefficient, every implementation
> would be forced to implement dict like PyPy (and CPython) for efficiency.
> But I don't know much about current MicroPython and other Python
> implementation's
> plan to catch Python 3.6 up.

FWIW, Python 3.7 is the first Python that where the language guarantees that 
regular dicts are order preserving.  And the feature being discussed in this 
thread is for Python 3.8.

What potential implementation obstacles do you foresee?  Can you imagine any 
possible way that an implementation would have an order preserving dict but 
would be unable to trivially implement __reversed__?  How could an 
implementation have a __setitem__ that appends at the end, and a popitem() that 
pops from that same end, but still not be able to easily iterate in reverse?  
It really doesn't matter whether an implementer uses a dense array of keys or a 
doubly-linked-list; either way, looping backward is as easy as going forward. 


Raymond


P.S. It isn't going to be hard to update MicroPython to have a compact and 
ordered dict (based on my review of their existing dict implementation).  This 
is something they are really going to want because of the improved memory 
efficiency.  Also, they're also already going to need it just to comply with 
guaranteed keyword argument ordering and guaranteed ordering of class 
dictionaries.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com