On Sat, Jan 23, 2016 at 12:30 AM, Rustom Mody <rustompm...@gmail.com> wrote: > You just gave a graphic vivid description... > of the same thing Marko is describing: ;-) viz. > A full-size language parser is something that you - an experienced developer - > make a point of avoiding.
It's worth noting that "experienced developer" covers a huge range of skills. There are quite a few other areas that I do not tinker with (crypto, CPU-level optimizations, and such), not because they're impossible to understand, but because *I* have not the skill to understand and improve them. This does mean they're complicated (they're beyond the "one weekend of tinkering" barrier that any serious geek should be able to invest), but I'm sure there are language nerds out there who are so familiar with the grammar of <insert language here> that they'll pick up CPython's grammar and make a change with confidence that it'll do what they expect. > So then the question comes down to this: Is this the order of nature? > Or is it man-made disorder? > Jury's out on that one for lexers/parsers specifically. Lexers/parsers are as complicated as the grammars they parse. A lexer for a simple structured text file can be pretty easy to implement; for instance, JSON is pretty straight-forward, with only a handful of cases (insignificant whitespace, three keywords, two recursive structures that start with specific characters ('{' and '['), strings (which start with '"'), and numbers (which start with a digit or a hyphen)), so a parser need only look for those few possibilities and it knows exactly what else to fetch up. I could probably write a JSON parser in a fairly short space of time, and wouldn't be scared of digging into the internals of someone else's. It's when the grammar adds complexities to deal with the real-world issues of full size programming languages that it becomes hairier. The CPython grammar is only ~150 lines of fairly readable directives, but the parser that implements it is ~3500 lines of C code. Pike merges the two into a YACC file of nearly 5000 lines of highly optimized code (it has different grammar paths for things a human would consider the same, in order to produce distinct code). That's where I'm ubercautious. > For arbitrary code in general, the problem that it may be arbitrarily and > unboundedly > complex/complicated is the oldest problem in computer science: the halting > problem. > > IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either > has a visa to utopia or needs to re-evaluate (or get) a CS degree Exactly. ChrisA -- https://mail.python.org/mailman/listinfo/python-list