Re: HTML extraction

2021-12-09 Thread Dieter Maurer
Pieter van Oostrum wrote at 2021-12-8 11:00 +0100: > ... >bs4 can do it, but lxml wants correct XML. Use `lxml's the `HTMLParser` to parse HTML (--> "see https://lxml.de/parsing.html#parsing-html";). -- https://mail.python.org/mailman/listinfo/python-list

Re: HTML extraction

2021-12-08 Thread Pieter van Oostrum
Roland Mueller writes: > But isn't bs4 only for SOAP content? > Can bs4 or lxml cope with HTML code that does not comply with XML as the > following fragment? > > A > B > > bs4 can do it, but lxml wants correct XML. Jupyter console 6.4.0 Python 3.9.9 (main, Nov 16 2021, 07:21:43) Type 'copyr

Re: HTML extraction

2021-12-08 Thread Dieter Maurer
Roland Mueller wrote at 2021-12-7 22:55 +0200: > ... >Can bs4 or lxml cope with HTML code that does not comply with XML as the >following fragment? `lxml` comes with an HTML parser; that can be configured to check loosely. -- https://mail.python.org/mailman/listinfo/python-list

Re: HTML extraction

2021-12-07 Thread Chris Angelico
On Wed, Dec 8, 2021 at 7:55 AM Roland Mueller wrote: > > Hello, > > ti 7. jouluk. 2021 klo 20.08 Chris Angelico (ros...@gmail.com) kirjoitti: >> >> On Wed, Dec 8, 2021 at 4:55 AM Julius Hamilton >> wrote: >> > >> > Hey, >> > >> > Could anyone please comment on the purest way simply to strip HTML

Re: HTML extraction

2021-12-07 Thread Roland Mueller via Python-list
Hello, ti 7. jouluk. 2021 klo 20.08 Chris Angelico (ros...@gmail.com) kirjoitti: > On Wed, Dec 8, 2021 at 4:55 AM Julius Hamilton > wrote: > > > > Hey, > > > > Could anyone please comment on the purest way simply to strip HTML tags > > from the internal text they surround? > > > > I know Beautif

Re: HTML extraction

2021-12-07 Thread Chris Angelico
On Wed, Dec 8, 2021 at 4:55 AM Julius Hamilton wrote: > > Hey, > > Could anyone please comment on the purest way simply to strip HTML tags > from the internal text they surround? > > I know Beautiful Soup is a convenient tool, but I’m interested to know what > the most minimal way to do it would b