Ezio Melotti added the comment: I did some macro-benchmarks and the proposed changes don't seem to affect the result (most likely because they are in _parse_doctype_element and _parse_doctype_attlist which should be called only once per document).
I did some profiling, and this is the result: 4437196 function calls (4436748 primitive calls) in 36.582 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 92931 7.400 0.000 17.082 0.000 parser.py:320(parse_starttag) 202 6.363 0.032 36.281 0.180 parser.py:171(goahead) 673285 5.302 0.000 5.302 0.000 {method 'match' of '_sre.SRE_Pattern' objects} 369418 3.272 0.000 4.554 0.000 _markupbase.py:48(updatepos) 83243 2.698 0.000 4.639 0.000 parser.py:421(parse_endtag) 308882 2.006 0.000 2.006 0.000 {method 'group' of '_sre.SRE_Match' objects} 270074 1.521 0.000 1.521 0.000 {method 'search' of '_sre.SRE_Pattern' objects} 92931 1.150 0.000 2.643 0.000 parser.py:378(check_for_whole_start_tag) 291079 1.028 0.000 1.028 0.000 {method 'count' of 'str' objects} 295892 0.883 0.000 0.883 0.000 {method 'startswith' of 'str' objects} 387439 0.733 0.000 0.733 0.000 {method 'lower' of 'str' objects} 403922 0.642 0.000 0.642 0.000 {method 'end' of '_sre.SRE_Match' objects} 124512 0.406 0.000 1.156 0.000 parser.py:504(unescape) 186775 0.326 0.000 0.326 0.000 {method 'start' of '_sre.SRE_Match' objects} 96213 0.255 0.000 0.255 0.000 {method 'endswith' of 'str' objects} 59522 0.253 0.000 0.253 0.000 {method 'rindex' of 'str' objects} 83226 0.215 0.000 0.215 0.000 parser.py:164(clear_cdata_mode) 6428 0.194 0.000 0.337 0.000 parser.py:507(replaceEntities) 106487 0.183 0.000 0.183 0.000 parser.py:484(handle_data) Excluding string and regex methods, the 3 slowest methods are parse_starttag, goahead, and updatepos. The attached patch adds a couple of simple optimizations to the first two -- I couldn't think a way to optimize updatepos. The resulting speedup is however fairly small, so I'm not sure it's worth applying the patch. I might try doing other benchmarks in future (should I add them somewhere in Tools?). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17183> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com