Re: weirdness with list()

2021-02-28 Thread Peter Otten

On 28/02/2021 01:17, Cameron Simpson wrote:

I just ran into a surprising (to me) issue with list() on an iterable
object.

My object represents an MDAT box in an MP4 file: it is the ludicrously
large data box containing the raw audiovideo data; for a TV episode it
is often about 2GB and a movie is often 4GB to 6GB. For obvious reasons,
I do not always want to load that into memory, or even read the data
part at all when scanning an MP4 file, for example to recite its
metadata.

So my parser has a "skip" mode where it seeks straight past the data,
but makes a note of its length in bytes. All good.

That length is presented via the object's __len__ method, because I want
to know that length later and this is a subclass of a suite of things
which return their length in bytes this way.

So, to my problem:

I've got a walk method which traverses the hierarchy of boxes in the MP4
file. Until some minutes ago, it looked like this:

   def walk(self):
 subboxes = list(self)
 yield self, subboxes
 for subbox in subboxes:
   if isinstance(subbox, Box):
 yield from subbox.walk()

somewhat like os.walk does for a file tree.

I noticed that it was stalling, and investigation revealed it was
stalling at this line:

 subboxes = list(self)

when doing the MDAT box. That box (a) has no subboxes at all and (b) has
a very large __len__ value.

BUT... It also has a __iter__ value, which like any Box iterates over
the subboxes. For MDAT that is implemented like this:

 def __iter__(self):
 yield from ()

What I was expecting was pretty much instant construction of an empty
list. What I was getting was a very time consuming (10 seconds or more)
construction of an empty list.

I believe that this is because list() tries to preallocate storage. I
_infer_ from the docs that this is done maybe using
operator.length_hint, which in turn consults "the actual length of the
object" (meaning __len__ for me?), then __length_hint__, then defaults
to 0.

I've changed my walk function like so:

   def walk(self):
 subboxes = []
 for subbox in self:
   subboxes.append(subbox)
 ##subboxes = list(self)


list(iter(self))

should work, too. It may be faster than the explicit loop, but also
defeats the list allocation optimization.


and commented out the former list(self) incantation. This is very fast,
because it makes an empty list and then appends nothing to it. And for
your typical movie file this is fine, because there are never _very_
many immediate subboxes anyway.

But is there a cleaner way to do this?

I'd like to go back to my former list(self) incantation, and modify the
MDAT box class to arrange something efficient. Setting __length_hint__
didn't help: returning NotImplemeneted or 0 had no effect, because
presumably __len__ was consulted first.

Any suggestions? My current approach feels rather hacky.

I'm already leaning towards making __len__ return the number of subboxes
to match the iterator, especially as on reflection not all my subclasses
are consistent about __len__ meaning the length of their binary form;
I'm probably going to have to fix that - some subclasses are actually
namedtuples where __len__ would be the field count. Ugh.

Still, thoughts? I'm interested in any approaches that would have let me
make list() fast while keeping __len__==binary_length.

I'm accepting that __len__ != len(__iter__) is a bad idea now, though.


Indeed. I see how that train wreck happened -- but the weirdness is not
the list behavior.

Maybe you can capture the intended behavior of your class with two
classes, a MyIterable without length that can be converted into MyList
as needed.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Tkinter new window contentent when button is clicked.

2021-02-28 Thread Bischoop
On 2021-02-25, MRAB  wrote:
>> 
> The trick is to put the "pages" on top of each other and then show the 
> appropriate one, something like this:
> import tkinter as tk
>
> def on_next_page():
>  # Brings page 2 to the top.
>  frame_2.tkraise()
>
> def on_previous_page():
>  # Brings page 1 to the top.
>  frame_1.tkraise()
>
> def on_finish():
>  # Closes the dialog.
>  root.destroy()
>
> root = tk.Tk()
>
> # Page 1.
> frame_1 = tk.Frame(root)
> tk.Label(frame_1, text='Page 1').pack()
> tk.Button(frame_1, text='Next', command=on_next_page).pack()
>
> # Page 2.
> frame_2 = tk.Frame()
> tk.Label(frame_2, text='Page 2').pack()
> tk.Button(frame_2, text='Previous', command=on_previous_page).pack()
> tk.Button(frame_2, text='Finish', command=on_finish).pack()
>
> # Put the pages on top of each other.
> frame_1.grid(row=0, column=0, sticky='news')
> frame_2.grid(row=0, column=0, sticky='news')
>
> # Bring page 1 to the top.
> frame_1.tkraise()
>
> tk.mainloop()


Great, thanks for reply, I'll look into that.


--
Thanks

-- 
https://mail.python.org/mailman/listinfo/python-list


[Announce] - preview of the new python extension "pymsgque"

2021-02-28 Thread aotto1968


Hello everybody,

  PYTHON has a new application server
  → https://nhi1.selfhost.co/nhi1/

  SOMETHING that writes code for you
  → https://nhi1.selfhost.co/wiki/NHI1_-_the_POWER_of_programming

have fun
--
https://mail.python.org/mailman/listinfo/python-list


Re: weirdness with list()

2021-02-28 Thread Marco Sulla
On Sun, 28 Feb 2021 at 01:19, Cameron Simpson  wrote:
> My object represents an MDAT box in an MP4 file: it is the ludicrously
> large data box containing the raw audiovideo data; for a TV episode it
> is often about 2GB and a movie is often 4GB to 6GB.
> [...]
> That length is presented via the object's __len__ method
> [...]
>
> I noticed that it was stalling, and investigation revealed it was
> stalling at this line:
>
> subboxes = list(self)
>
> when doing the MDAT box. That box (a) has no subboxes at all and (b) has
> a very large __len__ value.
>
> BUT... It also has a __iter__ value, which like any Box iterates over
> the subboxes. For MDAT that is implemented like this:
>
> def __iter__(self):
> yield from ()
>
> What I was expecting was pretty much instant construction of an empty
> list. What I was getting was a very time consuming (10 seconds or more)
> construction of an empty list.

I can't reproduce, Am I missing something?

marco@buzz:~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
... def __len__(self):
... return 1024**3
... def __iter__(self):
... yield from ()
...
>>> a = A()
>>> len(a)
1073741824
>>> list(a)
[]
>>>

It takes milliseconds to run list(a)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: weirdness with list()

2021-02-28 Thread Peter Otten

On 28/02/2021 23:33, Marco Sulla wrote:

On Sun, 28 Feb 2021 at 01:19, Cameron Simpson  wrote:

My object represents an MDAT box in an MP4 file: it is the ludicrously
large data box containing the raw audiovideo data; for a TV episode it
is often about 2GB and a movie is often 4GB to 6GB.
[...]
That length is presented via the object's __len__ method
[...]

I noticed that it was stalling, and investigation revealed it was
stalling at this line:

 subboxes = list(self)

when doing the MDAT box. That box (a) has no subboxes at all and (b) has
a very large __len__ value.

BUT... It also has a __iter__ value, which like any Box iterates over
the subboxes. For MDAT that is implemented like this:

 def __iter__(self):
 yield from ()

What I was expecting was pretty much instant construction of an empty
list. What I was getting was a very time consuming (10 seconds or more)
construction of an empty list.


I can't reproduce, Am I missing something?

marco@buzz:~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

class A:

... def __len__(self):
... return 1024**3
... def __iter__(self):
... yield from ()
...

a = A()
len(a)

1073741824

list(a)

[]




It takes milliseconds to run list(a)


Looks like you need at least Python 3.8 to see this. Quoting
https://docs.python.org/3/whatsnew/3.8.html:

"""
The list constructor does not overallocate the internal item buffer if 
the input iterable has a known length (the input implements __len__). 
This makes the created list 12% smaller on average. (Contributed by 
Raymond Hettinger and Pablo Galindo in bpo-33234.)

"""



--
https://mail.python.org/mailman/listinfo/python-list


Re: weirdness with list()

2021-02-28 Thread Cameron Simpson
On 28Feb2021 10:51, Peter Otten <__pete...@web.de> wrote:
>On 28/02/2021 01:17, Cameron Simpson wrote:
>>I noticed that it was stalling, and investigation revealed it was
>>stalling at this line:
>>
>> subboxes = list(self)
>>
>>when doing the MDAT box. That box (a) has no subboxes at all and (b) has
>>a very large __len__ value.
[...]
>
>list(iter(self))
>
>should work, too. It may be faster than the explicit loop, but also
>defeats the list allocation optimization.

Yes, very neat. I went with [subbox for subbox in self] last night, but 
the above is better.

[...]
>>Still, thoughts? I'm interested in any approaches that would have let 
>>me
>>make list() fast while keeping __len__==binary_length.
>>
>>I'm accepting that __len__ != len(__iter__) is a bad idea now, though.
>
>Indeed. I see how that train wreck happened -- but the weirdness is not
>the list behavior.

I agree. The only weirdness is that list(empty-iterable) took a very 
long time. Weirdness in the eye of the beholder I guess.

>Maybe you can capture the intended behavior of your class with two
>classes, a MyIterable without length that can be converted into MyList
>as needed.

Hmm. Maybe.

What I've done so far is:

The afore mentioned [subbox for subbox in self] which I'll replace with 
your nicer one today.

Given my BinaryMixin a transcribed_length method which measures the 
length of the binary transcription. For small things that's actually 
fairly cheap, and totally general. By default it is aliased to __len__, 
which still seems a natural thing - the length of the binary object is 
the number of bytes required to serialise it.

The alias lets me override transcribed_length() for bulky things like 
MDAT where (a) transcription _is_ expensive and (b) the source data may 
not be present anyway ("skip" mode), but the measurement of the data 
from the parse is recorded.

And I can disassociate __len__ from transcribed_length() if need be in 
subclasses. I've not done that, given the iter() shuffle above.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: weirdness with list()

2021-02-28 Thread Cameron Simpson
On 01Mar2021 00:28, Peter Otten <__pete...@web.de> wrote:
>On 28/02/2021 23:33, Marco Sulla wrote:
>>I can't reproduce, Am I missing something?
>>
>>marco@buzz:~$ python3
>>Python 3.6.9 (default, Jan 26 2021, 15:33:00)
>>[GCC 8.4.0] on linux
>>Type "help", "copyright", "credits" or "license" for more information.
>class A:
>>... def __len__(self):
>>... return 1024**3
>>... def __iter__(self):
>>... yield from ()
>>...
>a = A()
>len(a)
>>1073741824
>list(a)
>>[]
>
>>
>>It takes milliseconds to run list(a)
>
>Looks like you need at least Python 3.8 to see this. Quoting
>https://docs.python.org/3/whatsnew/3.8.html:
>"""
>The list constructor does not overallocate the internal item buffer if
>the input iterable has a known length (the input implements __len__).
>This makes the created list 12% smaller on average. (Contributed by
>Raymond Hettinger and Pablo Galindo in bpo-33234.)
>"""

That may also explain why I hadn't noticed this before, eg last year.

I do kind of wish __length_hint__ overrode __len__ rather than the other 
way around, if it's doing what I think it's doing.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: weirdness with list()

2021-02-28 Thread MRAB

On 2021-02-28 23:28, Peter Otten wrote:

On 28/02/2021 23:33, Marco Sulla wrote:

On Sun, 28 Feb 2021 at 01:19, Cameron Simpson  wrote:

My object represents an MDAT box in an MP4 file: it is the ludicrously
large data box containing the raw audiovideo data; for a TV episode it
is often about 2GB and a movie is often 4GB to 6GB.
[...]
That length is presented via the object's __len__ method
[...]

I noticed that it was stalling, and investigation revealed it was
stalling at this line:

 subboxes = list(self)

when doing the MDAT box. That box (a) has no subboxes at all and (b) has
a very large __len__ value.

BUT... It also has a __iter__ value, which like any Box iterates over
the subboxes. For MDAT that is implemented like this:

 def __iter__(self):
 yield from ()

What I was expecting was pretty much instant construction of an empty
list. What I was getting was a very time consuming (10 seconds or more)
construction of an empty list.


I can't reproduce, Am I missing something?

marco@buzz:~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

class A:

... def __len__(self):
... return 1024**3
... def __iter__(self):
... yield from ()
...

a = A()
len(a)

1073741824

list(a)

[]




It takes milliseconds to run list(a)


Looks like you need at least Python 3.8 to see this. Quoting
https://docs.python.org/3/whatsnew/3.8.html:

"""
The list constructor does not overallocate the internal item buffer if
the input iterable has a known length (the input implements __len__).
This makes the created list 12% smaller on average. (Contributed by
Raymond Hettinger and Pablo Galindo in bpo-33234.)
"""


I'm not seeing a huge problem here:

Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 
bit (AMD64)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> class A:
... def __len__(self):
... return 1024**3
... def __iter__(self):
... yield from ()
...
>>> a = A()
>>> len(a)
1073741824
>>> s = time.time()
>>> list(a)
[]
>>> print(time.time() - s)
0.16294455528259277
--
https://mail.python.org/mailman/listinfo/python-list


Packaging/MANIFEST.in: Incude All, Exclude .gitignore

2021-02-28 Thread Abdur-Rahmaan Janhangeer
Greetings list,

SInce i have a .gitignore, how do i exclude
all files and folders listed by my gitignore?
How do i include everything by default?

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: weirdness with list()

2021-02-28 Thread Greg Ewing

On 28/02/21 1:17 pm, Cameron Simpson wrote:

[its length in bytes] is presented via the object's __len__ method,



BUT... It also has a __iter__ value, which like any Box iterates over
the subboxes.


You're misusing __len__ here. If an object is iterable and
also has a __len__, its __len__ should return the number of
items you would get if you iterated over it. Anything else
is confusing and can lead to trouble, as you found here.


But is there a cleaner way to do this?


Yes. Give up on using __len__ to get the length in bytes,
and provide another way to do that.

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list