Paweł Widera added the comment:
No. As the value of the href attribute is not suppose to contain spaces, I'd
rather expect the parser to assume that there is an ending " missing before the
space.
--
___
Python tracker
<http://bu
Paweł Widera added the comment:
Great! With one "but"... the second case *is* handled by browsers. Browsers do
not throw an exception on it as HTMLParser do. So improvement is definitely
possible here. If it is worth an effort, it is not for m
Paweł Widera added the comment:
It depends whether you want a HTMLParser to be an useful tool that can
deal with real world HTML or just a toy without practical meaning.
Crashing on every little deviation from the standard, where more relaxed
approach is possible, doesn't sound to me
New submission from Paweł Widera :
Of course both are not correct HTML but are easy to guess, so I believe
the parser should not give up too quick here.
1) extra comma between attributes
2) missing closing quotation mark for the first attribute
http://xxx.org/xxx.php?a=1 target="_blank&q
Paweł Widera added the comment:
A simple workaround for the BeautifulSoup is the following wrapper. It
sanitize the javascript code before passing it to the parser by joining
the disjoint strings, so that "" becomes "".
def bs(input):
pattern = re.compile(
Changes by Paweł Widera :
--
nosy: +momat
___
Python tracker
<http://bugs.python.org/issue670664>
___
___
Python-bugs-list mailing list
Unsubscribe: