On 27/06/2018 12:42, Peter J. Holzer wrote:
On 2018-06-27 11:11:37 +1200, Gregory Ewing wrote:
Bart wrote:
    x = set(range(10_000_000))

This used up 460MB of RAM (the original 100M I tried exhausted the memory).

The advantage of Pascal-style sets is that that same set will occupy
only 1.25MB, as it is a bit-map.

That's true, but they're also extremely limited compared to
the things you can do with Python sets. They're really two
quite different things, designed for different use cases.

Compact sets of integers are possible in Python, but not
as a built-in type. This suggests that they're not needed
much

Also, when such sets are needed, a simple array of bits may not be
sufficient. For example, sets may be sparse, or they may have almost all
except a few bits set. In these cases a tree or RLE representation is
much smaller and faster. There are a number of such implementations
(Judy Arrays and Roaring Bitmaps come to mind[1]). Each has its
advantages and limitations, so its probably better to leave the choice
to the author.

From roaringbitmap.org:

"Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps."

So from one extreme to another: from the 300-400 bits per element used by Python's set type, to some presumably tiny fraction of a bit [on average] used in this scheme, as 1 bit per element is too much.

Is there no place at all for an ordinary, straightforward, uncompressed bit-set where each element takes exactly one bit? It does seem as though Python users have an aversion to simplicity and efficiency.

FWIW when I was working with actual image bitmaps, they were nearly always uncompressed when in memory. 1-bit-per-pixel images, for mono images or masks, would pack 8 pixels per byte. Compression would only be used for storage or transmission.

And the same with all the set types I've used.

However there is a slight difference with the concept of a 'set' as I used it, and as it was used in Pascal, compared with Python's set: there was the notion of the overall size of the set: the total number of elements, whether each was '1' or '0'.

So a set type that represented all ASCII codes would have a size of 128 elements, indexed 0 to 127. So such a set initialised to ['A'..'Z'] and then inverted would give the set [0..64,91..127].

--
bart
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to