Hi Laura! On Thu, 21 Apr 2022 15:53:03 +0200 Laura Arjona Reina <larj...@debian.org> wrote:
> Package: www.debian.org > User: www.debian....@packages.debian.org > Usertag: scripts > Severity: important > > Hi all > I'm starting to work in the bug #980921 (Pages in HTML5) and, as it is > mentioned there, we need to adapt our "validate" script so it correctly > processes the pages declared as HTML5 (currently, only the homepage in the > different languages). > > The current status is following: > > Related scripts: > > https://salsa.debian.org/webmaster-team/cron/-/blob/master/lessoften executed > once a day, calling (via run-parts) the following script: > https://salsa.debian.org/webmaster-team/cron/-/blob/master/scripts/999Xvalidate > which gets the list of languages and folders to process and then calls: > > https://salsa.debian.org/webmaster-team/cron/-/blob/master/scripts/validate > > Which is the actual script doing the HTML validation, using the onsgmls > command (part of opensp package). > > This command validates a SGML file based on a DTD. The issue (as far as I > know) is that there is no "official" SGML DTD template to use when parsing > HTML5 files. > > I have tried adapting the "validate" script to be able to recognize the > DOCTYPE header used for html5 files, and then tried to pass a DTD (I tried > downloading the ones here http://sgmljs.net/docs/w3c-html5-dtd.html and here > http://sgmljs.net/docs/w3c-html52-dtd.html and also here > https://jkorpela.fi/html5-dtd.html ) but couldn't make it work, and also was > not convinced it is the better approach. > > I've tried to look at what w3c validator uses and they use Nu.checker: > > https://validator.w3.org/nu/about.html > https://github.com/validator/validator/releases/latest > > But I'm not sure if this is packaged in Debian in any of its flavours. > > I have searched https://packages.debian.org/search?keywords=html5 but none of > the results looks like a commandline tool that we could call instead of > onsgmls > > So I don't know what to do at this point. > > In my local machine, I have downloaded the vnu.jar file from the latest Nu > checker release " and tried to validate files and it works. But I don't know > if asking DSA to install openjdk in www-master and include a copy of vnu.jar > in our cron scripts is good and/or elegant. > > Opinions, advice and patches are very welcome. > > Meanwhile, I guess we can modify 99Xvalidate to add file exclusions, and > exclude, for now, /index.*.html and later the few other files we have with > html5 tags for now. I don't know how to exclude the index.*.html files on top > folder only and not in subfolders but I guess playing with find -wholename > and prune will do the treak (if you know, please go ahead). > > Kind regards, Perhaps my vnu wrapper will prove of use: * https://github.com/shlomif/python-vnu_validator * https://pypi.org/project/vnu-validator/ * https://github.com/shlomif/shlomi-fish-homepage/blob/master/Tests/validate-html-using-vnu.py -- Shlomi Fish https://www.shlomifish.org/ What Makes Software Apps High Quality - https://shlom.in/sw-quality <rindolf> Underscores are the most nutritious punctuation. But you also need to eat letters, digits and whitespace for a balanced diet. — https://is.gd/pHLcFq Please reply to list if it's a mailing list post - https://shlom.in/reply .