New submission from Ezio Melotti:

Currently test_htmlparser feeds the HTML source to the parser one char at the 
time (except a couple of buffering-specific tests that feed the parser with 
chunks of text).  This ensures that the parser doesn't break when the source is 
fed in smaller chunks (that might end in the middle of a tag).  However #20288 
revealed a bug that doesn't happen while feeding the parser char by char.

In order to avoid similar problems, all the tests should feed the source to the 
parser both char by char and as a single string.
So my plan is:
1) wait until #15114 is resolved and the strict mode and the strict tests are 
removed;
2) either change TestCaseBase._run_check() to run every test twice (possibly by 
using subTest), or use a subclass-based approach with a different _run_check in 
the two subclasses.

A few notes about this:
* a third kind of test that feeds the parser with chunk of arbitrary length 
(e.g. 5 chars) could be added as well;
* the increase in run-time shouldn't matter, since all the tests take very 
little time to run;
* I don't think it's necessary to backport this to 2.7/3.3/3.4 because it's a 
somewhat major refactoring, and if a bug is introduced by other changes the 
tests in 3.5 will find it (I expect all the bug fixes and new features to land 
in 3.5 too);

----------
assignee: ezio.melotti
components: Tests
messages: 211201
nosy: ezio.melotti, r.david.murray
priority: normal
severity: normal
stage: needs patch
status: open
title: Run test_htmlparser with unbuffered source
type: behavior
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20623>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to