Robert B. wrote:

I'm evaluating XMLMind Editor and w2x for my company, mainly for MS-Word to "styled" XHTML conversion. It's a great product, but I've run into a couple of questions.

XMLmind XML Editor and XMLmind Word To XML are two separate products. Yes, XMLmind XML Editor Professional Edition includes a fully functional Word To XML in the form of a plug-in. But XMLmind XML Editor Professional Edition certainly does not come with full support for XMLmind Word To XML.

If you were an XMLmind XML Editor Professional Edition customer, I would not have answered your questions (well, at least not the "advanced" ones).



--> Now let's use this sample w2x command-line to answer your questions (you'll find a command very close to this one in W2X_install_dir/doc/manual/conv_manual.bat and conv_manual.sh):

---
../../bin/w2x -v \
   -p convert.charset UTF-8 \
   -p split.split-before-level 1 \
   -p split.use-id-as-filename yes \
   -o frameset manual.docx out/frameset/manual.html
---



All three questions relate to the editor's "Import DOCX" feature:

*Question 1:*

Is it possible to output STYLED XHTML headings, for example <h1> instead of <p class="p-Heading1"> ?

I have tried adding the following parameters in the options file with no success.

-p headings.convert ""

-p headings.convert yes

-p edit.blocks.convert "p-Heading1 h1 ! p-Heading2 h2 ! p-Heading3 h3 ! p-Heading4 h4 ! p-Heading5 h5"

All these parameters do not apply to the "xhtml_css" (XHTML 1.0 Transitional + CSS) conversion. They apply to the semantic XML --including various flavors of XHTML-- conversions. See http://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#edit_step


--> Yes, this is possible using:
---
../../bin/w2x -v \
   -p convert.charset UTF-8 \
   -pu edit.before.finish-styles p2h.xed \
   -p split.split-before-level 1 \
   -p split.use-id-as-filename yes \
   -o frameset manual.docx out/frameset/manual.html
---

Notice:

-pu edit.before.finish-styles p2h.xed

where attached XED script "p2h.xed" is:
---
namespace "http://www.w3.org/1999/xhtml";;
namespace html = "http://www.w3.org/1999/xhtml";;

for-each /html/body//p[get-class("^p-Heading\d$")] {
     set-variable("level",
substring-after(get-class("^p-Heading\d$"), "p-Heading"));

     if ($level >= 1 and $level <= 5) {
         set-element-name(concat("h", $level));
     }
}
---
XED reference: http://www.xmlmind.com/w2x/_distrib/doc/xedscript/index.html

This certainly works, but this also breaks how headings are automatically numbered using CSS counters. For example, you get "0.1.3 The general case" instead of expected "6.2.3 The general case".

This is caused by the fact that our code does not expect our customers to mix semantic tags (h1, h2, ..., h5, ol, ul, li, etc) with the bunch of cleanly styled paragraphs generated by the "xhtml_css" conversion.

[[[Note for the developer:

The "Split" step expects:

<p class="p-Heading3 n-1-2"
id="general_customize_semantic_xml"><?counter label="6.2.3" name="n-1-2" value="3"?>The general case</p>

NOT:

<h3 class="p-Heading3 n-1-2"
id="general_customize_semantic_xml"><?counter label="6.2.3" name="n-1-2" value="3"?>The general case</h3>

in SplitStep#collectCounters]]]



*Question 2:*

Is it possible to generate a styled XHTML frameset without splitting the output files at all? The output would be just the TOC in the left frame and the complete content in the right frame.

No.

-p split.split-before-level 0

currently specifies the largest chunks you can get. See http://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#split_step





*Question 3:*

When converting docx files that contain images stored as VML shapes (docs that have been converted from on older version of word), "alt text" on the VML images is not recognized. Instead of using the v:shape's alt attribute, the convertor is using the v:imagedata o:title attribute.

For example, this produces an alt attribute of "figure-2".

<v:shape alt="CORRECT ALT TEXT"><v:imagedata o:title="figure-2"/>

Is this configurable, or can it be fixed?I know the workaround is to "convert" the file so it's not in "compatibility mode" in Word, but this is not a practical solution in my case.

This looks like a bug. We'll check this one and do our best to fix it in next release of W2X.


namespace "http://www.w3.org/1999/xhtml";;
namespace html = "http://www.w3.org/1999/xhtml";;

for-each /html/body//p[get-class("^p-Heading\d$")] {
    set-variable("level",
                 substring-after(get-class("^p-Heading\d$"), "p-Heading"));

    if ($level >= 1 and $level <= 5) {
        set-element-name(concat("h", $level));
    }
}
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to