Josh Rosenberg added the comment:

So, just to be clear, checking the implementation (as of 3.4):

1. All the ipaddress classes leave __slots__ unset. So the overhead is actually 
higher than 56 bytes per instance; the __dict__ for an IPv4Address (ignoring 
the actual keys and values in the dict, just looking at the size of the dict 
itself), on Python 3.4, 64 bit Windows, is 56 + 288 = 344 bytes. Based on the 
size of the dict, I'm guessing some instance attributes aren't being set 
eagerly in __init__, so it's not getting the reduced memory usage from shared 
key dictionaries either.
2. It looks like as of 3.4, _version and _max_prefixlen were both instance 
attributes; the current code base seems to have made _max_prefixlen a class 
attribute, but left _version as an instance attribute (anything set on self is 
an instance attribute; doesn't matter if it's set in __init__ of a base class)
3. Even if we switch to using __slots__, remove support for weak references and 
user added attributes, and store nothing but the raw address encoded as an int, 
you're never going to get true "flyweight" objects in CPython. The lowest 
instance cost you can get for a class defined in Python (on a system with 64 
bit pointers) inheriting from object, is 40 bytes, plus 8 bytes per slot (plus 
the actual cost of the int object, which is another 16 bytes for an IPv4 
address on a 64 bit OS). If you want it lighter-weight than that, either you 
need to inherit from another built-in that it even lighter-weight, or you need 
to implement it in C. If you built IPv4Address on top of int for instance, you 
could get the instance cost down to 16 bytes. Downside to inheriting from int 
is that you'll interact with many other objects as if you were an int, which 
can cause a lot of unexpected behaviors (as others have noted).

Basically, there is no concept of "flyweight" object in Python. Aside from 
implementing it in C or inheriting from int, you're going to be using an 
absolute minimum (on 64 bit build) of 48 bytes for the basic object structure, 
plus another 16 for the int itself, 64 bytes total, or about 16x the "real" 
data being stored. Using a WeakValueDictionary would actually require another 8 
bytes per instance (a slot for __weakref__), and the overhead of the dictionary 
would likely be higher than the memory saved (unless you're regularly creating 
duplicate IP addresses and storing them for the long haul, but I suspect this 
is a less common use case than processing many different IP addresses once or 
twice).

I do think it would be a good idea to use __slots__ properly to get the memory 
down below 100 bytes per instance, but adding automatic "IP address interning" 
is probably not worth the effort. In the longer term, a C implementation of 
ipaddress might be worth it, not for improvements in computational performance, 
but to get the memory usage down for applications that need to make millions of 
instances.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23103>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to