On Fri, 02 Dec 2011 17:02:01 +1100, Chris Angelico wrote: > On Fri, Dec 2, 2011 at 4:34 PM, Steven D'Aprano > <steve+comp.lang.pyt...@pearwood.info> wrote: >> On Fri, 02 Dec 2011 13:07:57 +1100, Chris Angelico wrote: >>> I would consider integer representations of ASCII to be code smell. >>> It's not instantly obvious that 45 means '-', even if you happen to >>> know the ASCII table by heart (which most people won't). > > Note, I'm not saying that C's way is perfect; merely that using the > integer 45 to represent a hyphen is worse.
Dude, it was deliberately obfuscated code. I even named the function "obfuscated_prefixes". I thought that would have been a hint <wink> It's kinda scary that of all the sins against readability committed in my function, including isinstance(type(c), type(type)) which I was particularly proud of, the only criticism you came up with was that chr(45) is hard to read. I'm impressed <grins like a mad thing> [...] >> Note that this still doesn't work the way we might like in EBCDIC, but >> the very fact that you are forced to think about explicit conversion >> steps means you are less likely to make unwarranted assumptions about >> what characters convert to. > > I don't know about that. Anyone brought up on ASCII and moving to EBCDIC > will likely have trouble with this, no matter how many function calls it > takes. Of course you will, because EBCDIC is a pile of festering garbage :) But IMAO you're less likely to have trouble with with Unicode if you haven't been trained to treat characters as synonymous with integers. And besides, given how rare such byte-manipulations on ASCII characters are in Python, it would be a shame to lose the ability to use '' and "" for strings just to avoid calling ord and chr functions. >> Better than both, I would say, would be for string objects to have >> successor and predecessor methods, that skip ahead (or back) the >> specified number of code points (defaulting to 1): >> >> 'A'.succ() => 'B' >> 'A'.succ(5) => 'F' >> >> with appropriate exceptions if you try to go below 0 or above the >> largest code point. > > ... and this still has that same issue. Arithmetic on codepoints depends > on that. We shouldn't be doing arithmetic on code points. Or at least we shouldn't unless we are writing a Unicode library that *needs* to care about the implementation. We should only care about the interface, that the character after 'A' is 'B'. Implementation-wise, we shouldn't care whether A and B are represented in memory by 0x0041 and 0x0042, or by 0x14AF and 0x9B30. All we really need to know is that B comes immediately after A. Everything else is implementation. But I fear that the idea of working with chr and ord is far to ingrained now to get rid of it. -- Steven -- http://mail.python.org/mailman/listinfo/python-list