Quoting Bastien Roucariès (2021-09-02 17:53:18) > A few year ago I have created the privacy-breach lintian checks in > order to detect trackers in our doc > > I think we are losing the battle here. > > I believe that we need better tools than sed in order to fix this kind > of problem. > > I have some idea like: > - read the html tree > - convert the html tree dom representation to xml serialization (so called > XHTML5 or polyglot) > - apply to this xhtml5 xslt2 rules for fixing the privacy breach > > The problem are the tools to use... > > I will like to use javascript for this kind of transformation but > nodejs does not compile on armel, and for saxon-ce I need gwt that is > not in debian... > > I could use saxon2,but it will need java.
Perl is famous for its text juggling features, and sloppy parsing of html can be done e.g. with HTML::HTML5::Parser (i.e. Debian package libhtml-html5-parser-perl). Also, debhelper itself is written in perl, so is likely easier to integrate plugins written in perl as well. If perl is an option at all, obviously... I am sure Python/Ruby/PHP/Haskell/Scheme/Rust/etc. folks will argue that their pet language is the right for the task as well: I think it will help the conversation if you clarify what you are open to and what are constraints for you. E.g. do you mean that it *must* be JavaScript when you mention that? Or are you perhaps asking if someone else wants to take over the challenge from you, so it does not matter how it is done? - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc
Description: signature