On Thu, Jul 25, 2013 at 7:44 PM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Thu, 25 Jul 2013 18:15:22 +1000, Chris Angelico wrote: >> That's true, but we already have that issue with sets. What's the union >> of {0} and {0.0}? Python's answer: It depends on the order of the >> operands. > > That's a side-effect of how numeric equality works in Python. Since 0 == > 0.0, you can't have both as keys in the same dict, or set. Indeed, the > same numeric equality issue occurs here: > > py> from fractions import Fraction > py> [0, 2.5] == [0.0, Fraction(5, 2)] > True > > So nothing really to do with sets or dicts specifically.
Here's how I imagine set/dict union: 1) Take a copy of the first object 2) Iterate through the second. If the key doesn't exist in the result, add it. This works just fine even when "add it" means "store this value against this key". The dict's value and the object's identity are both ignored, and you simply take the first one you find. > Aside: I think the contrary behaviour is, well, contrary. It would be > strange and disturbing to do this: > > for key in some_dict: > if key == 0: > print("found") > print(some_dict[key]) > > and have the loop print "found" and then have the key lookup fail, but > apparently that's how things work in Pike :-( I agree, that would be very strange and disturbing. I mentioned that aspect merely in passing, but the reason for the difference is not an oddity of key lookup, but a different decision about float and int: in Pike, 0 and 0.0 are not equal. (Nor are 1 and 1.0, in case you thought this was a weirdness of zero.) It's a debatable point; are we trying to say that all numeric types represent real numbers, and are equal if they represent the same real number? Or are different representations distinct, just as much as the string "0" is different from the integer 0? Pike took the latter approach. PHP took the former approach to its illogical extreme, that the string "0001E1" is equal to "000010" (both strings). No, the dictionary definitely needs to use object equality to do its lookup, although I could well imagine an implementation that runs orders of magnitude faster when object identity can be used. >> I would say that Python can freely pick from the first two options you >> offered (either keep-first or keep-last), most likely the first one, and >> it'd make good sense. Your third option would be good for a few specific >> circumstances, but then you probably would also want the combination of >> {1:'a'} and {1:'a'} to be {1:['a','a']} for consistency. > > Okay, that's six variations. And no, I don't think the "consistency" > argument is right -- the idea is that you can have multiple values per > key. Since 'a' == 'a', that's only one value, not two. Well, it depends what you're doing with the merging of the dicts. But all of these extra ways to do things would be explicitly-named functions with much rarer usage (and quite possibly not part of the standard library, they'd be snippets shared around and put directly in application code). >> Raising an error would work, but is IMO unnecessary. > > I believe that's the only reasonable way for a dict union method to work. > As the Zen says: > > In the face of ambiguity, refuse the temptation to guess. > > Since there is ambiguity which value should be associated with the key, > don't guess. There's already ambiguity as to which of two equal values should be retained by the set. Python takes the first. Is that guessing? Is that violating the zen? I don't see a problem with the current set implementation, and I also don't see a problem with using that for dict merging. > Object identity is a red herring. It would be perfectly valid for a > Python implementation to create new instances of each element in the set > union, assuming such creation was free of side-effects (apart from memory > usage and time, naturally). set.union() makes no promise about the > identity of elements, and it is defined the same way for languages where > object identity does not exist (say, old-school Pascal). That still doesn't deal with the "which type should the new object be". We're back to this question: What is the union of 0 and 0.0? >>> {0} | {0.0} {0} >>> {0.0} | {0} {0.0} Maybe Python could create a brand new object, but would it be an int or a float? The only way I could imagine this working is with a modified-set class that takes an object constructor, and passes every object through it. That way, you could have set(float) that coerces everything to float on entry, which would enforce what you're saying (even down to potentially creating a new object with a new id, though float() seems to return a float argument unchanged in CPython 3.3). Would that really help anything, though? Do we gain anything by not simply accepting, in the manner of Colonel Fairfax, the first that comes? ChrisA -- http://mail.python.org/mailman/listinfo/python-list