On Tuesday, March 25, 2014 11:42:50 AM UTC+5:30, Chris Angelico wrote: > On Tue, Mar 25, 2014 at 4:47 PM, Steven D'Aprano wrote: > > On Tue, 25 Mar 2014 14:57:02 +1100, Chris Angelico wrote: > >> No, I'm not missing that. But the human brain is a tokenizer, just as > >> Python is. Once you know what a token means, you comprehend it as that > >> token, and it takes up space in your mind as a single unit. There's not > >> a lot of readability difference between a one-symbol token and a > >> one-word token. > > Hmmm, I don't know about that. Mathematicians are heavy users of symbols. > > Why do they write ∀ instead of "for all", or ⊂ instead of "subset"? > > Why do we write "40" instead of "forty"?
> Because the shorter symbols lend themselves better to the > "super-tokenization" where you don't read the individual parts but the > whole. The difference between "40" and "forty" is minimal, but the > difference between "86400" and "eighty-six thousand [and] four > hundred" is significant; the first is a single token, which you could > then instantly recognize as the number of seconds in a day (leap > seconds aside), but the second is a lengthy expression. > There's also ease of writing. On paper or blackboard, it's really easy > to write little strokes and curvy lines to mean things, and to write a > bolded letter R to mean "Real numbers". In Python, it's much easier to > use a few more ASCII letters than to write ⊂ ℝ. > >> Also, since the human brain works largely with words, > > I think that's a fairly controversial opinion. The Chinese might have > > something to say about that. > Well, all the people I interviewed (three of them: me, myself, and I) > agree that the human brain works with words. My research is 100% > scientific, and is therefore unassailable. So there. :) > > I think that heavy use of symbols is a form of Huffman coding -- common > > things should be short, and uncommon things longer. Mathematicians tend > > to be *extremely* specialised, so they're all inventing their own Huffman > > codings, and the end result is a huge number of (often ambiguous) symbols. > Yeah. That's about the size of it. Usually, each symbol has some read > form; "ℕ ⊂ ℝ" would be read as "Naturals are a subset of Reals" (or > maybe "Naturals is a subset of Reals"?), and in program code, using > the word "subset" or "issubset" wouldn't be much worse. It would be > some worse, and the exact cost depends on how frequently your code > does subset comparisons; my view is that the worseness of words is > less than the worseness of untypable symbols. (And I'm about to be > arrested for murdering the English language.) > > Personally, I think that it would be good to start accepting, but not > > requiring, Unicode in programming languages. We can already write: > > from math import pi as π > > Perhaps we should be able to write: > > setA ⊂ setB > It would be nice, if subset testing is considered common enough to > warrant it. (I'm not sure it is, but I'm certainly not sure it isn't.) > But it violates "one obvious way". Python doesn't, as a general rule, > offer us two ways of spelling the exact same thing. So the bar for > inclusion would be quite high: it has to be so much better than the > alternative that it justifies the creation of a duplicate notation. I dont think we are anywhere near making real suggestions for real changes which would need to talk of compatibility, portability, editor support and all such other good stuff. Just a bit of brainstorming to see how an alternative python would look like: Heres a quickly made list of symbols that may be nice to have support for × ÷ ≤ ≥ ∧ ∨ ¬ π λ ∈ ∉ ⊂ ⊃ ⊆ ⊇ ∅ ∩ ∪ ← … (ellipsis instead of range) -- https://mail.python.org/mailman/listinfo/python-list