Re: [XXE] styled XHTML h1, h2, ..., h5 instead of styled XHTML p.p-HeadingN, how?

Hussein Shafie Fri, 12 Mar 2021 02:34:31 -0800

Robert B. wrote:

I'm evaluating XMLMind Editor and w2x for my company, mainly for MS-Wordto "styled" XHTML conversion. It's a great product, but I've run into acouple of questions.

XMLmind XML Editor and XMLmind Word To XML are two separate products.Yes, XMLmind XML Editor Professional Edition includes a fully functionalWord To XML in the form of a plug-in. But XMLmind XML EditorProfessional Edition certainly does not come with full support forXMLmind Word To XML.

If you were an XMLmind XML Editor Professional Edition customer, I wouldnot have answered your questions (well, at least not the "advanced" ones).

--> Now let's use this sample w2x command-line to answer your questions(you'll find a command very close to this one inW2X_install_dir/doc/manual/conv_manual.bat and conv_manual.sh):


---
../../bin/w2x -v \
   -p convert.charset UTF-8 \
   -p split.split-before-level 1 \
   -p split.use-id-as-filename yes \
   -o frameset manual.docx out/frameset/manual.html
---

All three questions relate to the editor's "Import DOCX" feature:

*Question 1:*
Is it possible to output STYLED XHTML headings, for example <h1> insteadof <p class="p-Heading1"> ?
I have tried adding the following parameters in the options file with nosuccess.
-p headings.convert ""

-p headings.convert yes
-p edit.blocks.convert "p-Heading1 h1 ! p-Heading2 h2 ! p-Heading3 h3 !p-Heading4 h4 ! p-Heading5 h5"

All these parameters do not apply to the "xhtml_css" (XHTML 1.0Transitional + CSS) conversion. They apply to the semantic XML--including various flavors of XHTML-- conversions. Seehttp://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#edit_step



--> Yes, this is possible using:
---
../../bin/w2x -v \
   -p convert.charset UTF-8 \
   -pu edit.before.finish-styles p2h.xed \
   -p split.split-before-level 1 \
   -p split.use-id-as-filename yes \
   -o frameset manual.docx out/frameset/manual.html
---

Notice:

-pu edit.before.finish-styles p2h.xed

where attached XED script "p2h.xed" is:
---
namespace "http://www.w3.org/1999/xhtml";;
namespace html = "http://www.w3.org/1999/xhtml";;

for-each /html/body//p[get-class("^p-Heading\d$")] {
     set-variable("level",

substring-after(get-class("^p-Heading\d$"),"p-Heading"));


     if ($level >= 1 and $level <= 5) {
         set-element-name(concat("h", $level));
     }
}
---
XED reference: http://www.xmlmind.com/w2x/_distrib/doc/xedscript/index.html

This certainly works, but this also breaks how headings areautomatically numbered using CSS counters. For example, you get "0.1.3The general case" instead of expected "6.2.3 The general case".

This is caused by the fact that our code does not expect our customersto mix semantic tags (h1, h2, ..., h5, ol, ul, li, etc) with the bunchof cleanly styled paragraphs generated by the "xhtml_css" conversion.


[[[Note for the developer:

The "Split" step expects:

<p class="p-Heading3 n-1-2"

id="general_customize_semantic_xml"><?counter label="6.2.3" name="n-1-2"value="3"?>The general case</p>


NOT:

<h3 class="p-Heading3 n-1-2"

id="general_customize_semantic_xml"><?counter label="6.2.3" name="n-1-2"value="3"?>The general case</h3>


in SplitStep#collectCounters]]]

*Question 2:*
Is it possible to generate a styled XHTML frameset without splitting theoutput files at all? The output would be just the TOC in the left frameand the complete content in the right frame.


No.

-p split.split-before-level 0

currently specifies the largest chunks you can get. Seehttp://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#split_step

*Question 3:*
When converting docx files that contain images stored as VML shapes(docs that have been converted from on older version of word), "alttext" on the VML images is not recognized. Instead of using thev:shape's alt attribute, the convertor is using the v:imagedata o:titleattribute.
For example, this produces an alt attribute of "figure-2".

<v:shape alt="CORRECT ALT TEXT"><v:imagedata o:title="figure-2"/>
Is this configurable, or can it be fixed?I know the workaround is to"convert" the file so it's not in "compatibility mode" in Word, but thisis not a practical solution in my case.

This looks like a bug. We'll check this one and do our best to fix it innext release of W2X.

namespace "http://www.w3.org/1999/xhtml";;
namespace html = "http://www.w3.org/1999/xhtml";;

for-each /html/body//p[get-class("^p-Heading\d$")] {
    set-variable("level",
                 substring-after(get-class("^p-Heading\d$"), "p-Heading"));

    if ($level >= 1 and $level <= 5) {
        set-element-name(concat("h", $level));
    }
}

--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Re: [XXE] styled XHTML h1, h2, ..., h5 instead of styled XHTML p.p-HeadingN, how?

Reply via email to