John Nagle schrieb: > > Very true. HTML is LALR(0), that is, you can parse it without > looking ahead. Parsers for LALR(0) languages are easy, and > work by repeatedly getting the next character and using that to > drive a single state machine. The first character-level parser > yields tokens, which are then processed by a grammar-level parser. > Any compiler book will cover this. > > Using regular expressions for LALR(0) parsing is a vice inherited >>from Perl, in which regular expressions are easy and "get next > character from string" is unreasonably expensive. In Python, at least > you can index through a string. > > John Nagle
I take it with LALR(0) you mean that HTML is a language created by a Chomsky-0 (regular language) Grammar? Thomas -- http://mail.python.org/mailman/listinfo/python-list