[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-21 Thread Paweł Widera
Paweł Widera added the comment: No. As the value of the href attribute is not suppose to contain spaces, I'd rather expect the parser to assume that there is an ending " missing before the space. -- ___ Python tracker <http://bu

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Paweł Widera
Paweł Widera added the comment: Great! With one "but"... the second case *is* handled by browsers. Browsers do not throw an exception on it as HTMLParser do. So improvement is definitely possible here. If it is worth an effort, it is not for m

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
Paweł Widera added the comment: It depends whether you want a HTMLParser to be an useful tool that can deal with real world HTML or just a toy without practical meaning. Crashing on every little deviation from the standard, where more relaxed approach is possible, doesn't sound to me

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
New submission from Paweł Widera : Of course both are not correct HTML but are easy to guess, so I believe the parser should not give up too quick here. 1) extra comma between attributes 2) missing closing quotation mark for the first attribute http://xxx.org/xxx.php?a=1 target="_blank&q

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-03 Thread Paweł Widera
Paweł Widera added the comment: A simple workaround for the BeautifulSoup is the following wrapper. It sanitize the javascript code before passing it to the parser by joining the disjoint strings, so that "" becomes "". def bs(input): pattern = re.compile(&#x

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-03 Thread Paweł Widera
Changes by Paweł Widera : -- nosy: +momat ___ Python tracker <http://bugs.python.org/issue670664> ___ ___ Python-bugs-list mailing list Unsubscribe: