Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison
On 11/27/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
> > I would greatly appreciate repliers that find a tiny bit of reason in
> > what I said (even if they don't agree), and not deny it all as a
> > complete load of rubbish.
>
> I don't understand what your message is. With this posting, did you
> suggest that somebody does something specific? If so, who is that one,
> and what should he do?
Perhaps I felt a bit attacked. It was probably my fault, and anyway, a
general message like this is not the proper way - I'm sorry.
>
> Anyway, a lot of your posting is what I thought was common knowledge;
> and with some of it, I disagree.
This is fine, of course.
> > We may want to compare wheels based on value, for example to make sure
> > that all the car's wheels fit together nicely: assert car.wheel1 ==
> > car.wheel2 == car.wheel3 == car.wheel4.
>
> I would never write it that way. This would suggest that the wheels
> have to be "the same". However, this is certainly not true for wheels:
> they have to have to be of the same make. Now, you write that wheels
> only carry manufacturer and diameter. However, I would expect that
> wheels grow additional attributes over time, like whether they are
> left or right, and what their wear level is. So to write your property,
> I would write
>
> car.wheel1.manufacturer_and_make() ==
> car.wheel2.manufacturer_and_make() ==
> car.wheel3.manufacturer_and_make() ==
> car.wheel4.manufacturer_and_make()
>
You may be right in the case of wheels. From time to time, in the real
(programming) world, I encounter objects that I wish to compare by
value - this is certainly the case for built-in objects, but is
sometimes the case for more complex objects.
> > We may want to associate values with wheels based on their values. For
> > example, it's reasonable to suppose that the price of every wheel of
> > the same model is the same. In that case, we'll write: price[wheel] =
> > 25.
>
> Again, I would not write it this way. I would find
>
> wheel.price()
Many times the objects are not yours to add attributes, or may have
__slots__ defined. The truth is that I prefer not to add attributes to
external objects even when it's possible.
>
> most natural. If I have the notion of a price list, then I would
> try to understand what the price list is keyed-by, e.g. model number:
>
> price[wheel.model] = 25
>
Sometimes there's no "key" - it's just the state of the object (what
if wheels don't have a model number?)
> > Now again, how will we say that a specific wheel is broken? Like this:
> >
> > broken[Ref(wheel)] = True
>
> If I want things to be keyed by identity, I would write
>
> broken = IdentityDictionary()
> ...
> broken[wheel] = True
>
> although I would prefer to write
>
> wheel.broken = True
>
I personally prefer the first method, but the second one is ok too.
> > I think that most objects, especially most user-defined objects, have
> > a *value*. I don't have an exact definition, but a hint is that two
> > objects that were created in the same way have the same value.
>
> Here I disagree. Consider the wheel example. I would expect that
> a wheel has a "wear level" or some such, and that this changes over
> time, and that it belongs to the "value" of the wheel ("value"
> being synonym to "state"). As this changes over time, it is certainly
> not that the object is created with that value.
>
> Think of lists: what is their value? Are they created with it?
>
My tounge failed me. I meant: created in the same way = have gone
through the same series of actions. That is:
a = []; a.append(5); a.extend([2,1]); a.pop()
b = []; b.append(5); b.entend([2,1]); b.pop()
a == b
> > Sometimes we wish to use the
> > identity of objects as a dictionary key or as a set member - and I
> > claim that we should do that by using the Ref class, whose *value* is
> > the object's *identity*, or by using a dict/set subclass, and not by
> > misusing the __hash__ and __eq__ methods.
>
> I think we should a specific type of dictionary then.
That's OK too. My point was that the one who uses the objects should
explicitly specify whether he means value-based of identity-based
lookup. This means that if an object has a "value", it should not make
__eq__ and __hash__ be identity-based just to make identity-based
lookup easier and implicit.
>
> > I think that whenever value-based comparison is meaningful, the __eq__
> > and __hash__ should be value-based. Treating objects by identity
> > should be done explicitly, by the one who uses the objects, by using
> > the "is" operator or the Ref class. It should not be the job of the
> > object to decide which method (value or identity) is more useful - it
> > should allow the user to use both methods, by defining __eq__ and
> > __hash__ based on value.
>
> If objects are compared for value equality, the object should decide
> which part of its state goes into that comparison. It may be that
> two objects compare equal even though their state i
Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison
On 11/27/05, Samuele Pedroni <[EMAIL PROTECTED]> wrote: > well, this still belongs to comp.lang.python. ... > not if you think python-dev is a forum for such discussions > on OO thinking vs other paradigms. Perhaps my style made it look like a discussion on OO thinking vs other paradigms, but my conclusion is exactly about the issue of this thread - Jim suggested to drop default __hash__ and __eq__ for Python 3K. Guido decided not to, because it's useful to use them for identity-based comparison and lookup. I say that I disagree, because I think that __hash__ and __eq__ should be used for value-based comparison and lookup, and because if the user of the object does explicit identity-based comparison/lookup, it doesn't matter to him whether __hash__ and __eq__ are defined or not. I also suggested, in a way, that it's OK to define a default value-based __eq__ method. Noam ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison
Hi Noam, On Sun, Nov 27, 2005 at 09:04:25PM +0200, Noam Raphael wrote: > No, I meant real programming examples. My theory is that most > user-defined classes have a "value", and those that don't are related > to I/O, in some sort of a broad definition of the term. I may be > wrong, so I ask for counter-examples. In the source code base of PyPy, trying to count only what we really wrote and not external tools, I found 19 classes defining __eq__ on a total of 1413. There must be close to zero classes that have anything to do with I/O in there. If anything, this proves that the default comparison for classes is absolutely fine and nothing needs to be fixed in the Python language. Please move this discussion outside python-dev. Armin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] urlparse brokenness
On 11/22/05, Paul Jimenez <[EMAIL PROTECTED]> wrote:
>
> It is my assertion that urlparse is currently broken. Specifically, I
> think that urlparse breaks an abstraction boundary with ill effect.
IIRC I did it this way because the RFC about parsing urls specifically
prescribed it had to be done this way. Maybe there's a newer RFC with
different rules?
> In writing a mailclient, I wished to allow my users to specify their
> imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which
> worked fine. I then thought that the natural extension to support
> configuration of imapssl would be 'imaps://user:[EMAIL PROTECTED]:port/'
> which failed - user:[EMAIL PROTECTED]:port got parsed as the *path* of
> the URL instead of the network location. It turns out that urlparse
> keeps a table of url schemes that 'use netloc'... that is to say,
> that have a 'user:[EMAIL PROTECTED]:port' part to their URL. I think this
> 'special knowledge' about particular schemes 1) breaks an abstraction
> boundary by having a function whose charter is to pull apart a
> particularly-formatted string behave differently based on the meaning of
> the string instead of the structure of it
I disagree. You have to know what the scheme means before you can
parse the rest -- there is (by design!) no standard parsing for
anything that follows the scheme and the colon. I don't even think
that you can trust that if the colon is followed by two slashes that
what follows is a netloc for all schemes.
But if there's an RFC that says otherwise I'll gladly concede;
urlparse's main goal in life is to b RFC compliant. Is your opinion
based on an RFC?
> and 2) fails to be extensible
> or forward compatible due to hardcoded 'magic' strings - if schemes were
> somehow 'registerable' as 'netloc using' or not, then this objection
> might be nullified, but the previous objection would still stand.
I think it is reasonable to propose an extension whereby one can
register a parser (or parsing flags like uses_netloc) for a specific
scheme, presuming there won't be conflicting registrations (which
should only happen if two independently developed libraries have a
different use for the same scheme -- a failure of standardization).
> So I propose that urlsplit, the main offender, be replaced with something
> that looks like:
>
> def urlsplit(url, scheme='', allow_fragments=1, default=('','','','','')):
Since you don't present your new code in diff format, could you
explain in English how what it does differs from the original? Or
perhaps you could present some unit tests (doctest would be ideal)
showing the desired behavior of the proposed code (I understand from
later posts that it may have some bugs). (For example, why add the
default parameter?)
> Note that I'm not sold on the _parse_cache, but I'm assuming it was there
> for a reason so I'm leaving that functionality as-is.
There's also a special case for http; given that the code is rather
general and hence slow, it makes sense that it attempts some
optimizations, and removing these might cause a nasty surprise for
some users.
> If this isn't the right forum for this discussion, or the right place to
> submit code, please let me know.
Please do submit patches to SF if you want then to be discussed.
> Also, please cc: me directly on responses
> as I'm not subscribed to the firehose that is python-dev.
ACK.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] urlparse brokenness
Guido van Rossum wrote: > IIRC I did it this way because the RFC about parsing urls specifically > prescribed it had to be done this way. That was true as of RFC 1808 (1995-1998), although the grammar actually allowed for a more generic interpretation. Such an interpretation was suggested in RFC 2396 (1998-2004) via a regular expression for parsing URI 'references' (a formal abstraction introduced in 2396) into 5 components (not six, since 'params' were moved into 'path' and eventually became an option on every path segment, not just the end of the path). The 5 components are: scheme, authority (formerly netloc), path, query, fragment. Parsing could result in some components being undefined, which is distinct from being empty (e.g., 'mailto:[EMAIL PROTECTED]' would have an undefined authority and fragment, and a defined, but empty, query). RFC 3986 / STD 66 (2005-) did not change the regular expression, but makes several references to these '5 major components' of a URI, and says that these components are scheme-independent; parsers that operate at the generic syntax level "can parse any URI reference into its major components. Once the scheme is determined, further scheme-specific parsing can be performed on the components." > You have to know what the scheme means before you can > parse the rest -- there is (by design!) no standard parsing for > anything that follows the scheme and the colon. Not since 1998, IMHO. It was implicit, at least since RFC 2396, that all URI references can be interpreted as having the 5 components, it was made explicit in RFC 3986 / STD 66. > I don't even think > that you can trust that if the colon is followed by two slashes that > what follows is a netloc for all schemes. You can. > But if there's an RFC that says otherwise I'll gladly concede; > urlparse's main goal in life is to b RFC compliant. Its intent seems to be to split a URI into its major components, which are now by definition scheme-independent (and have been, implicitly, for a long time), so the function shouldn't distinguish between schemes. Do you want to keep returning that 6-tuple, or can we make it return a 5-tuple? If we keep returning 'params' for backward compatibility, then that means the 'path' we are returning is not the 'path' that people would expect (they'll have to concatenate path+params to get what the generic syntax calls a 'path' nowadays). It's also deceptive because params are now allowed on all path segments, and the current function only takes them from the last segment. Also for backward compatibility, should an absent component continue to manifest in the result as an empty string? I think a compliant parser should make a distinction between absent and empty (it could make a difference, in theory). If a regular expression were used for parsing, it would produce None for absent components and empty-string for empty ones. I implemented it this way in 4Suite's Ft.Lib.Uri and it works nicely. Mike ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
