I am working on a toolbox for computer-archaeology where old data media are
"excavated" and presented on a web-page.
(https://github.com/Datamuseum-DK/AutoArchaeologist for anybody who cares).
Since these data-media can easily sum tens of gigabytes, mmap and virtual
memory is my weapons of choice and that has brought me into an obscure corner
of python where few people seem to venture: I want to access the
buffer-protocol from "userland".
The fundamental problem is that if I have a image of a disk and it has 2
partitions, I end up with the "mmap.mmap" object that mapped the raw disk
image, and two "bytes" or "bytearray" objects, each containing one partition,
for a total memory footprint of twice the size of the disk.
As the tool dives into the filesystems in the partitions and creates objects
for the individual files in the filesystem, that grows to three times the size
of the disk etc.
To avoid this, I am writing a "bytes-like" scatter-gather class (not yet
committed), and that is fine as far as it goes.
If I want to write one of my scatter-gather objects to disk, I have to:
fd.write(bytes(myobj))
As a preliminary point, I think that is just wrong: A class with a __bytes__
method should satisfy any needs the buffer-protocol might have, so this should
work:
fd.write(myobj)
But taking this a little bit further, I think __bytes__ should be allowed to be
an iterator, provided the object also offers __len__, so that this would work:
class bar():
def __len__(self):
return 3
def __bytes__(self):
yield b'0'
yield b'1'
yield b'2'
open("/tmp/_", "wb").write(foo())
This example is of course trivial, but hav the yield statements hand out
hundreds of megabytes, and the savings in time and malloc-space becomes very
tangible.
Poul-Henning
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/LPXGGCU2UG7Q7P4EYDKCR2XKH7HVYPB7/
Code of Conduct: http://python.org/psf/codeofconduct/