DESCRIPTION HTML parsing library based on the WHATWG Web Applications 1.0 "HTML5" specification[1]. The parser is designed to work with all existing flavors of HTML and implements well-defined error recovery that has been specified though analysis of the behavior of modern desktop web browsers.
html5lib currently allows parsing to both a custom "simpletree" format and to an ElementTree, if available. Future releases will include support for at least one DOM implementation, and it is possible to implement custom treebuilders although the API should not yet be considered stable. DOWNLOAD http://html5lib.googlecode.com/files/html5lib-0.2.zip BUGS This is the first release of html5lib and it is considered alpha quality software. However, it ships with over 230 passing unit tests covering most of the specified behavior. Bugs should be reported on the issue tracker [2] KNOWN ISSUES Error handling does not yet conform to the specification; not all errors are reported and the error messages are not informative. PROJECT PAGE More information about the project including documentation and information on getting involved is available on the project page: http://code.google.com/p/html5lib/ [1] http://whatwg.org/specs/web-apps/current-work/ [2] http://code.google.com/p/html5lib/issues/list -- http://mail.python.org/mailman/listinfo/python-list