On 11 June 2017 at 13:35, David Mertz <[email protected]> wrote:
> You are right. I made a thinko.
>
> List construction from an iterator is O(N) just as is `sum(1 for _ in it)`.
> Both of them need to march through every element. But as a constant
> multiplier, just constructing the list should be faster than needing an
> addition (Python append is O(1) because of smart dynamic memory
> pre-allocation).
>
> So the "just read the iterator" is about 2-3 times faster than
> read-then-accumulate). Of course, it the run-lengths are LARGE, we can
> start worrying about the extra memory allocation needed as a tradeoff. Your
> sum uses constant memory.
This would be another argument in favour of providing an
itertools.counted_len function, as that would be able to avoid all the
overheads currently associated with the "sum(1 for __ in iterable)"
counting strategy.
Without that, the best you can do in pure Python is to use
__length_hint__ to choose your preferred counting strategy at runtime
based on the specific input.
Something like:
from operator import length_hint
# 10k 64-bit pointers ~= 640k max temporary list
_USE_COUNTED_SUM = 10_001
def counted_len(iterable):
# For sized containers, just return their length
try:
return len(iterable)
except TypeError:
pass
# For probably-large inputs & those with no length hint, count them
hint = length_hint(iterable, default=_USE_COUNTED_SUM)
if hint >= _USE_COUNTED_SUM:
return sum(1 for __ in iter(iterable))
# Otherwise, make a temporary list and report its length
# as the specifics of the CPython implementation make this
# faster than using the generator expression
return len(list(iterable))
Cheers,
Nick.
P.S. itertools._grouper objects don't currently provide a length hint,
and it would be tricky to add one, since it would need to be based on
the number of remaining items in the original sequence, which would it
turn depend on *that* defining either __len__ or __length_hint__.
--
Nick Coghlan | [email protected] | Brisbane, Australia
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/