On 27/06/2018 12:42, Peter J. Holzer wrote:
On 2018-06-27 11:11:37 +1200, Gregory Ewing wrote:
Bart wrote:
x = set(range(10_000_000))
This used up 460MB of RAM (the original 100M I tried exhausted the memory).
The advantage of Pascal-style sets is that that same set will occupy
only 1.25MB, as it is a bit-map.
That's true, but they're also extremely limited compared to
the things you can do with Python sets. They're really two
quite different things, designed for different use cases.
Compact sets of integers are possible in Python, but not
as a built-in type. This suggests that they're not needed
much
Also, when such sets are needed, a simple array of bits may not be
sufficient. For example, sets may be sparse, or they may have almost all
except a few bits set. In these cases a tree or RLE representation is
much smaller and faster. There are a number of such implementations
(Judy Arrays and Roaring Bitmaps come to mind[1]). Each has its
advantages and limitations, so its probably better to leave the choice
to the author.
From roaringbitmap.org:
"Bitsets, also called bitmaps, are commonly used as fast data
structures. Unfortunately, they can use too much memory. To compensate,
we often use compressed bitmaps."
So from one extreme to another: from the 300-400 bits per element used
by Python's set type, to some presumably tiny fraction of a bit [on
average] used in this scheme, as 1 bit per element is too much.
Is there no place at all for an ordinary, straightforward, uncompressed
bit-set where each element takes exactly one bit? It does seem as though
Python users have an aversion to simplicity and efficiency.
FWIW when I was working with actual image bitmaps, they were nearly
always uncompressed when in memory. 1-bit-per-pixel images, for mono
images or masks, would pack 8 pixels per byte. Compression would only be
used for storage or transmission.
And the same with all the set types I've used.
However there is a slight difference with the concept of a 'set' as I
used it, and as it was used in Pascal, compared with Python's set: there
was the notion of the overall size of the set: the total number of
elements, whether each was '1' or '0'.
So a set type that represented all ASCII codes would have a size of 128
elements, indexed 0 to 127. So such a set initialised to ['A'..'Z'] and
then inverted would give the set [0..64,91..127].
--
bart
--
https://mail.python.org/mailman/listinfo/python-list