Pieter van Oostrum wrote at 2021-12-8 11:00 +0100:
> ...
>bs4 can do it, but lxml wants correct XML.
Use `lxml's the `HTMLParser` to parse HTML
(--> "see https://lxml.de/parsing.html#parsing-html";).
--
https://mail.python.org/mailman/listinfo/python-list
Roland Mueller writes:
> But isn't bs4 only for SOAP content?
> Can bs4 or lxml cope with HTML code that does not comply with XML as the
> following fragment?
>
> A
> B
>
>
bs4 can do it, but lxml wants correct XML.
Jupyter console 6.4.0
Python 3.9.9 (main, Nov 16 2021, 07:21:43)
Type 'copyr
Roland Mueller wrote at 2021-12-7 22:55 +0200:
> ...
>Can bs4 or lxml cope with HTML code that does not comply with XML as the
>following fragment?
`lxml` comes with an HTML parser; that can be configured to check loosely.
--
https://mail.python.org/mailman/listinfo/python-list
On Wed, Dec 8, 2021 at 7:55 AM Roland Mueller
wrote:
>
> Hello,
>
> ti 7. jouluk. 2021 klo 20.08 Chris Angelico (ros...@gmail.com) kirjoitti:
>>
>> On Wed, Dec 8, 2021 at 4:55 AM Julius Hamilton
>> wrote:
>> >
>> > Hey,
>> >
>> > Could anyone please comment on the purest way simply to strip HTML
Hello,
ti 7. jouluk. 2021 klo 20.08 Chris Angelico (ros...@gmail.com) kirjoitti:
> On Wed, Dec 8, 2021 at 4:55 AM Julius Hamilton
> wrote:
> >
> > Hey,
> >
> > Could anyone please comment on the purest way simply to strip HTML tags
> > from the internal text they surround?
> >
> > I know Beautif
On Wed, Dec 8, 2021 at 4:55 AM Julius Hamilton
wrote:
>
> Hey,
>
> Could anyone please comment on the purest way simply to strip HTML tags
> from the internal text they surround?
>
> I know Beautiful Soup is a convenient tool, but I’m interested to know what
> the most minimal way to do it would b