X-debbugs-Cc: [email protected]
Package: basex
Version: 7.1.1-2
Severity: wishlist

We read

 basex (7.1.1-2) unstable; urgency=low

  * Allow non well-formed HTML to be parsed if libtagsoup-java is installed.
  * Updated man page with an example on how to parse HTML. 

But we find no such example on the man page.

Also please add something to
http://docs.basex.org/wiki/Parsers ... OK, I added a minimal
http://docs.basex.org/wiki/Parsers#HTML_Parsers

By the way
http://home.ccil.org/~cowan/XML/tagsoup/
says

--files
    Output into individual files, with html extensions changed to xhtml. 
Otherwise, all output is sent to the standard
    output.
--html
    Output is in clean HTML: the XML declaration is suppressed, as are end-tags 
for the known empty elements.
--omit-xml-declaration
    The XML declaration is suppressed.

etc. Please mention how we can manipulate these via "declare option...".

Also mention how to manipulate the 'SAX features and properties' mentioned.

Allow us to attempt a round trip,

declare option db:parser "html";
declare option output:method "html";
declare option output:version "4.01";
declare option output:doctype-public "-//W3C//DTD HTML 4.01//EN";
declare option output:doctype-system "http://www.w3.org/TR/html4/strict.dtd";;
doc("http://jidanni.org/index.html";)

Alas, I need to somehow use --html, and also who is putting those
shape="rect" into my <a> links??
Ah, maybe --html will fix that too, http://www.xmlplease.com/shaperect .

I can accept the fact that comments are stripped, but there should be a
way to adjust things so one can get a closer HTML round trip.



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to