On Friday, January 22, 2016 at 6:05:02 PM UTC+5:30, Chris Angelico wrote: > On Fri, Jan 22, 2016 at 11:04 PM, Rustom Mody wrote: > > 2. My students trying to work inside the lexer made a mess because the > > extant lexer is a mess. > > I.e. while python(3) *claims* to accept Unicode input, the actual lexer is > > an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to > > unicode > > > > These are just specific examples that I am familiar with > > > Regarding lexers specifically, I have never seen any full-size > language parser that I've wanted to tinker with. They're always highly > optimized pieces of code, dealing with innumerable edge and corner > cases, and exploring them is always like dipping my toe into something > that's either ice-cold water or highly caustic acid, and I can't tell > which. >
You just gave a graphic vivid description... of the same thing Marko is describing: ;-) viz. A full-size language parser is something that you - an experienced developer - make a point of avoiding. So then the question comes down to this: Is this the order of nature? Or is it man-made disorder? Jury's out on that one for lexers/parsers specifically. For arbitrary code in general, the problem that it may be arbitrarily and unboundedly complex/complicated is the oldest problem in computer science: the halting problem. IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either has a visa to utopia or needs to re-evaluate (or get) a CS degree -- https://mail.python.org/mailman/listinfo/python-list