On Thu, Aug 27, 2020 at 03:28:07AM +1200, Greg Ewing wrote:
> On 27/08/20 12:53 am, Steven D'Aprano wrote:
>
> >Presumably the below method is provided by `object`. If not, what
> >provides it?
> >
> >> def __getindex__(self, *args, **kwds):
> >> if kwds:
> >> raise TypeError("Object does not support keyword indexes")
> >> if not args:
> >> raise TypeError("Object does not accept empty indexes")
>
> It's not literally a method, I just wrote it like that to
> illustrate the semantics. It would be done by the interpreter
> as part of the process of translating indexing operations into
> dunder calls.
Okay, so similar to my suggestion that this would be better implemented
in the byte-code rather than as a method of object.
> >What is your reasoning behind prohibiting keywords, when we're right in
> >the middle of a discussion over PEP 474 which aims to allow keywords?
>
> We're falling back to __getitem__ here, which doesn't currently allow
> keywords, and would stay that way. The point of this proposal is to
> not change __getitem__. If you want to get keywords, you provide
> __getindex__.
Point of order: **the getitem dunder** already allows keywords, and
always has, and always will. It's just a method.
It's the **subscript (pseudo-)operator** which doesn't support keywords.
This is a syntax limitation, not a limitation of the dunder method. If
the interpreter supports the syntax, it's neither here nor there to the
interpreter whether it calls `__getitem__` or `__getindex__` or
`__my_hovercraft_is_full_of_eels__` for that matter.
So if you want to accept keywords, you just add keywords to your
existing dunder method. If you don't want them, don't add them. We don't
need a new dunder just for the sake of keywords.
> >This is going to slow down the most common cases of subscripting: the
> >interpreter has to follow the entire MRO to find `__getindex__` in
> >object, which then dispatches to the `__getitem__` method.
>
> No, it would be done by checking type slots, no MRO search involved.
Okay, I didn't think of type slots.
But type slots are expensive in other ways. Every new type slot
increases the size of objects, and I've seen proposals for new dunders
knocked back for that reason, so presumably the people care about the
C-level care about the increase in memory and complexity from adding new
type slots.
Looking here:
https://docs.python.org/3/c-api/typeobj.html
I see that `__setitem__` and `__delitem__` are handled by the same type
slot, so presumably the triplet get-, set- and del- index type slots
would be the same. Still, that means adding two new type slots to
both sequence and mapping objects.
(I assume that it's not *all* objects that carry these slots. If it is
all objects, that makes the cost of this proposal correspondingly
higher.)
So if I understand it correctly, we have some choices when it comes to
sequence/mapping types:
1. the existing `__*item__` methods keep their slots, and new `__*index__`
slots are created, which makes both the item and index dunders fast
but increases the size of every object which uses any of those
methods;
2. the existing item slots stay as they are, there are no new index
slots, which keeps objects the same size but the new index protocol
will be slow;
3. the existing item slots are repurposed for index, which keeps objects
the same size, and the new protocol fast, but makes calling item
dunders slow;
4. and just for completion because of course this is not going to
happen, we could remove the existing item slots and not add index
slots so that both protocols are equally slow;
5. alternatively, we could leave the existing C-level sequence and
mapping objects alone, and create *four* brand new C-level objects:
- a sequence object that supports only the new index protocol;
- a sequence object that supports both index and item protocols;
- and likewise two new mapping objects.
Do I understand this correctly? Have I missed any options?
Assuming I do, 4 is never going to happen and each of the others have
some fairly large disadvantages and costs in speed, memory, and
complexity. Without a correspondingly large advantage to this new
`__*index__` protocol, I don't see this going anywhere.
> >In your earlier statement, you said that it would be possible for
> >subscripting to mean something different depending on whether the
> >comma-separated subscripts had parentheses around them or not:
> >
> > obj[(2, 3)]
> > obj[2, 3]
> >
> >How does that happen?
>
> If the object has a __getindex__ method, it gets whatever is between
> the [] the same way as a normal function call, so comma-separated
> expressions become separate positional arguments.
The compiler doesn't know whether the object has the `__getindex__`
method at compile time, so any process that relies on that knowledge
isn't going to work. There can only be one set of parsing rules that
applies regardless of whether the object defines the item dunders or the
index dunders or neither.
Right now, if you call `obj[1,]` the dunder receives the tuple (1,) as
index. If it were treated as function call syntax, that would receive a
single argument 1 instead. It were treated as a tuple, as required by
backwards compatibility, that's an inconsistency between subscripts and
function calls, and the whole point of your proposal is to remove that
inconsistency.
Rock (you are here) Hard Place.
Do you break existing code, or fail in your effort to remove the
inconsistencies?
I don't care two hoots about the inconsistencies, I just want to use
keywords in my subscripts, so for me the answer is obvious: keep
backwards compatibility, and there is no need to add new dunders to only
partially fix something which isn't a problem.
Another inconsistency: function call syntax looks like this:
call ::= primary "(" [argument_list [","] | comprehension] ")"
which means we can write generator comprehensions inside function
calls without additional parentheses:
func(expr for x in items) # unambiguously a generator comprehension
This is nice because the round brackets of the function call match the
round brackets used in generator comprehensions, so it is perfctly
consistent and unambiguous.
But if you do that in a subscript, we currently get a syntax error. If
we allowed it, it would be pretty weird for the square brackets of the
subscript to create a *generator* comprehension rather than a list
comprehension. But we surely don't want a list comprehension by default:
obj[(expr for x in items)] # unambiguously a generator comprehension
obj[[expr for x in items]] # unambiguously a list comprehension
obj[expr for x in items] # and this is... what?
It looks like it should be a list comprehension (it has square brackets,
right?) but we probably don't want it to be a list comp, we'd prefer it
to be a generator comp because they are more flexible. Only that would
look weird and would lead to all sorts of questions about why list
comprehension syntax sometimes gives a list and sometimes a generator.
But if we leave it out, we have an inconsistency between subscripting
and function calls, and for those who are motivated by removing that
inconsistency, that's a Bad Thing.
For me, again, the answer is obvious: we don't have to support this for
the sake of consistency, because consistency isn't the motivation. I
just want keywords.
--
Steve
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/A4CVFW6AUBXFA3Y62RON63ZUSRQ2VZCX/
Code of Conduct: http://python.org/psf/codeofconduct/