On Saturday, July 15, 2017 at 8:54:40 PM UTC-5, MRAB wrote: > You need to be careful about the terminology.
You are correct. I admit I was a little loose with my terms there. > Is linefeed a character? Since LineFeed is the same as NewLine, then yes, IMO, linefeed is a character. > You might call [linefeed] a "control character", but it's > not really a _character_, it's control/format _code_. True. Allow me try and define some concrete terms that we can use. In the old days, long before i was born, and even long before i downloaded my first compiler (ah the memories!), the concept of strings was so much simpler. Yep, back in those days all you had was, basically, two discreate sub components of a string: the "actual chars" and the "virtual chars". (Disambiguation) The "actual chars"[1] are any chars that a programmer could insert by pressing a single key on the keyboard, such as: "1", "2", "3", "a", "b", "c" , "!", "@", "#" -- etc.. The "virtual chars" -- or the "control codes" as you put it (the ones that start with a "\") -- are the chars that represent "structural elements" of the string (f.i. \n, \t, etc..). But in reality, the implementation of strings has complicated the idea of "virtual chars as solely structural elements" of the display, by including such absurdities as: (1) Sounds ("\a") (2) Virtual interactions such as: BackSpace("\b"), CarrigeReturn ("\r") and FormFeed ("\f") intermixed with control codes that constitute _actual_ structural elements such as: (1) LineFeed or NewLine ("\n") (2) HorizontalTab ("\t") (3) VericalTab ("\v") And a few other non-structural codes that allow embedding delimiters or hex or octals. And furthermore, two distinct "realms", if i may, in which a string can exist: the "virtual character realm" and the "display realm". (Disambiguation) The "virtual character realm" is sort of like an operating room where a doctor (aka: programmer) performs operations on the patient (aka: string), or if you like, a castle where a mad scientist builds a Unicode monster from a hodgepodge of body parts he stole from local grave yards and is later lynched by a mob of angry peasants for his perceived sins against nature. But i digress... Whereas the "display realm" is sort of like an awards ceremony for celebrities, except here, strings take the place of strung-out celebs and characters are dressed in the over-hyped rags (aka: font) of an overpaid fashion designer . But the two "realms" and two "character types" are but only a small sample of the syntactical complexity of Python strings. For we haven't even discussed the many types of string literals that Python defines. Some include: (1) "Normal Strings" (2) r"Raw Strings (3) b"Byte Strings" (4) u"Unicode Strings" (5) ru"Raw Unicode" (6) ur'Unicode "that is _raw_"' (7) f"Format literals" ... Whew! IMO, I think the reason why the implementation of strings has been such a tough nut to crack (Python3000 notwithstanding), is due very much to what i call a "syntactical circus". > Is an acute accent a character? No, it's a diacritic mark > that's added to a character. And i agree. Chris was arguing that zero width spaces should not be counted as characters when the `len()` function is applied to the string, for which i disagree on the basis of consistency. My first reaction is: "Why would you inject a char into a string -- even a zero-width char! -- and then expect that the char should not affect the length of the string as returned by `len`?" Being that strings (on the highest level) are merely linear arrays of chars, such an assumption defies all logic. Furthermore, the length of a string (in chars) and the "perceived" length of a string (when rendered on a screen, or printed on paper), are in no way relevant to one another. When we, as programmers, are manipulateing strings (slicing, munging, concatenating, etc..) our only concern should be that _every_ char is accessable, indexable, quantifiable and will maintain its order. And whether or not a char will be visible, when rendered on a screen or paper, is irrelevant to these "programmer centric" operations. Rendering is the domain of graphic designers, not software developers. > When you're working with Unicode strings, you're not > working with strings of characters as such, but with > strings of 'codepoints', some of which are characters, > others combining marks, yet others format codes, and so on. Which is unfortunate for the programmer. Who would like to get things done without a viscous implementation mucking up the gears. [1] Of course, even in the realms of ASCII, there are chars that cannot be inserted by the programmer _simply_ by pressing a single key on the keyboard. But most of these chars were useless anyways. So we will ignore this small detail for now. One point to mention is that Unicode greatly increased the number of useless chars. -- https://mail.python.org/mailman/listinfo/python-list