Re: Beautiful Soup - close tags more promptly?

Chris Angelico Mon, 24 Oct 2022 03:59:24 -0700

On Mon, 24 Oct 2022 at 21:33, Peter J. Holzer <[email protected]> wrote:
> Ron has already noted that the lxml and html5 parser do the right thing,
> so just for the record:
>
> The HTML fragment above is well-formed and contains a number of li
> elements at the same level directly below the ol element, not lots of
> nested li elements. The end tag of the li element is optional (except in
> XHTML) and li elements don't nest.


That's correct. However, parsing it with html.parser and then
reconstituting it as shown in the example code results in all the
</li> tags coming up right before the </ol>, indicating that the <li>
tags were parsed as deeply nested rather than as siblings.

In order to get a successful parse out of this, I need something which
sees them as siblings, which html5lib seems to be doing fine. Whether
it has other issues, I don't know, but I guess I'll find out.... it's
currently running on the live site and taking several hours (due to
network delays and the server being slow, so I don't really want to
parallelize and overload the thing).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Beautiful Soup - close tags more promptly?

Reply via email to