Thomas Ploch schrieb: > John Nagle schrieb: >> Very true. HTML is LALR(0), that is, you can parse it without >> looking ahead. Parsers for LALR(0) languages are easy, and >> work by repeatedly getting the next character and using that to >> drive a single state machine. The first character-level parser >> yields tokens, which are then processed by a grammar-level parser. >> Any compiler book will cover this. >> >> Using regular expressions for LALR(0) parsing is a vice inherited >> >from Perl, in which regular expressions are easy and "get next >> character from string" is unreasonably expensive. In Python, at least >> you can index through a string. >> >> John Nagle > > I take it with LALR(0) you mean that HTML is a language created by a > Chomsky-0 (regular language) Grammar?
Nope. LALR is a context free grammar parsing technique. Regular expressions can't express languages like a^n b^n but something like <div><div></div></div> is <div>^2</div>^2 Diez -- http://mail.python.org/mailman/listinfo/python-list