On Mon, Mar 16, 2020 at 02:40:45PM +0100, Michal Herko wrote: > Dear maintainor of guile-lib. > I believe the special handling of <p> elements in (htmlprag) module > to be a bug. > For example: > > (use-modules (htmlprag)) > (html->shtml "<html><body><div><p>text</p></div></body></html>") > ; expected result (*TOP* (html (body (div (p "text"))))) > ; actual (*TOP* (html (body (div) (p "text")))) > > Note that the <p> element is parsed outside the <div> element. > I attach the simple patch to remove the special case for <p> elements. >
> diff --git a/src/htmlprag.scm b/src/htmlprag.scm > index 3bd352b..df99612 100644 > --- a/src/htmlprag.scm > +++ b/src/htmlprag.scm > @@ -1099,7 +1099,6 @@ > (meta . (head)) > (noframes . (frameset)) > (option . (select)) > - (p . (body td th)) > (param . (applet)) > (tbody . (table)) > (td . (tr)) Where did you get htmlprag from? I'll guess it's from the Debian package guile-lib. It seems the upstream isn't maintained anymore [1]. The Debian package page [2] lists a maintainer you might want to contact. That said, you are modifying the parser's "parent constraints"; I would go the other direction and add <span> to the set of <p>'s possible parents: > (option . (select)) > - (p . (body td th)) > + (p . (body td th span)) > (param . (applet)) HTML has changed a lot since htmlprag saw its heyday. Cheers [1] https://planet.racket-lang.org/package-source/neil/htmlprag.plt/1/7/planet-docs/htmlprag/index.html [2] https://packages.debian.org/buster/guile-library -- tomás
signature.asc
Description: Digital signature