Re the benchmarks:
yes, they're micro benchmarks, the intent was to show that the
performance can be non impacting
no, that doesn't invalidate them (just scopes their usefulness, my sales
pitch at the end was slightly over-egging things but reasonable, imo),
yes I ignored direct quadratic behaviour in indexing, as I would never
propose that as a goal
adding in iter() based comparisons would be interesting, however it
doesn't invalidate the list() option, as this is very often used as a
solution to the problem.
It's true, benchmarks that don't match your incentives and opinions always
lie.
As for use-cases, I'll admit that I see this as a fairly minor
quality-of-life issue. Finding use-cases is a bit tricky, as the fact that
dictionaries have defined order is a recent feature, and I know I am (and
I'm sure many other people) are still adapting to take advantage of this
new functionality. There's also the fact that in python < 3, the results of
dict.keys(), values() and items() was a Sequence, so the impact of this
change may still be being felt (yes, even decades later, the majority of
the python I've written to deal with 'messy' data involving lots of
dictionaries has been in python 2).
However I've put together a set of cases that I personally would like to
work as they appear to (Several of these are paraphrases of production code
I've worked with):
--
>>> import random
>>> random.choice({'a': 1, 'b': 2}.keys())
'a'
--
>>> import numpy as np
>>> mapping_table = np.array(BIG_LOOKUP_DICT.items())
[[1, 99],
[2, 23],
...
]
--
>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> params = {'a': 1, 'b': 2}
>>> placeholders = ', '.join(f':{p}' for p in params)
>>> statement = f"select {placeholders}"
>>> print(f"Running: {statement}")
Running: select :a, :b
>>> cur=conn.execute(statement, params.values())
>>> cur.fetchall()
[(1, 2)]
--
# This currently works, but is deprecated in 3.9
>>> import random
>>> dict(random.sample({'a': 1, 'b': 2}.items(), 2))
{'b': 2, 'a': 1}
--
>>> def min_max_keys(d):
>>> min_key, min_val = d.items()[0]
>>> max_key, max_val = min_key, min_val
>>> for key, value in d.items():
>>> if value < min_val:
>>> min_key = key
>>> min_val = value
>>> if value > max_val:
>>> max_key = key
>>> max_val = value
>>> return min_key, max_key
>>> min_max_keys({'a': 1, 'b': 2, 'c': -9999})
>>> min_max_keys({'a': 'x', 'b': 'y', 'c': 'z'})
--
>>> import os
>>> users = {'cups': 209, 'service': 991}
>>> os.setgroups(users.values())
---
Obviously, python is a general-purpose, turing complete language, so each
of these options can be written in other ways. But it would be nice if the
simple, readable versions also worked :D
The idea that there are future, unspecified changes to dicts() that may or
may not be hampered by allowing indexing sounds like FUD to me, unless
there are concrete references?
Steve
On Thu, Jul 9, 2020 at 3:04 PM Inada Naoki <[email protected]> wrote:
> On Thu, Jul 9, 2020 at 12:45 PM Christopher Barker <[email protected]>
> wrote:
> >
> > On Wed, Jul 8, 2020 at 7:13 PM Inada Naoki <[email protected]>
> wrote:
> >>
> >> I think this comparison is unfair.
> >
> > well, benchmarks always lie ....
> >
> >> > d.items()[0] vs list(d.items())[0]
> >>
> >> Should be compared with `next(iter(d.items())`
> >
> > why? the entire point of this idea is to have indexing syntax -- we can
> already use the iteration protocol top do this. Not that it's a bad idea to
> time that too, but since under the hood it's doing the same or less work,
> I'm not sure what the point is.
> >
>
> Because this code solves "take the first item in the dict".
>
> If you need to benchmark index access, you should compare your
> dict.items()[0] and list index.
> You shouldn't create list from d.items8) every loop.
>
> >> > d.keys()[-1] vs list(d.keys())[-1]
> >>
> >> Should be compared with `next(reversed(d.keys()))`, or
> `next(reversed(d))`.
> >
> >
> > Same point - the idea is to have indexing syntax. Though yes, it would
> be good to see how it compares. But I know predicting performance is
> usually wrong, but this is going to require a full traversal of the
> underlying keys in either case.
> >
>
> Same here. And note that dict and dict views now supports reversed().
>
> >>
> >> > random.choice(d.items()) vs random.choice(list(d.items()))
> >>
> >> Should be compared with `random.choice(items_list)` with `items_list =
> >> list(d.items())` setup too.
> >
> > I don't follow this one -- could you explain? what is items_list ?
>
> I explained `item_list = list(d.items())`. Do it in setup (e.g. before
> loop.)
> ("setup" is term used by timeit module.)
>
> >
> > But what this didn't check is how bad the performance could be for what
> I expect would be a bad performance case -- indexing teh keys repeatedly:
> >
> > for i in lots_of_indexes:
> > a_dict.keys[i]
> >
> > vs:
> >
> > keys_list = list(a_dict.keys)
> > for it in lots_of_indexes:
> > keys_list[i]
> >
>
> You should do this.
>
> > I suspect it wouldn't take all that many indexes for making a list a
> better option.
> >
>
> If you need to index access many times, creating list is the recommended
> way.
> You shouldn't ignore it. That's why I said it's an unfair comparison.
> You should compare "current recommended way" vs "propsed way".
>
> > But again, we are badk to use cases. As Stephen pointed out no one has
> produced an actualy production code use case.
>
> I agree.
>
> Regards,
>
> --
> Inada Naoki <[email protected]>
>
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/NKLBMZIWXQEOSFRJ3W3VHF3RZ6XLCKGB/
Code of Conduct: http://python.org/psf/codeofconduct/