Robert B. wrote:
I'm evaluating XMLMind Editor and w2x for my company, mainly for MS-Word
to "styled" XHTML conversion. It's a great product, but I've run into a
couple of questions.
XMLmind XML Editor and XMLmind Word To XML are two separate products.
Yes, XMLmind XML Editor Professional Edition includes a fully functional
Word To XML in the form of a plug-in. But XMLmind XML Editor
Professional Edition certainly does not come with full support for
XMLmind Word To XML.
If you were an XMLmind XML Editor Professional Edition customer, I would
not have answered your questions (well, at least not the "advanced" ones).
--> Now let's use this sample w2x command-line to answer your questions
(you'll find a command very close to this one in
W2X_install_dir/doc/manual/conv_manual.bat and conv_manual.sh):
---
../../bin/w2x -v \
-p convert.charset UTF-8 \
-p split.split-before-level 1 \
-p split.use-id-as-filename yes \
-o frameset manual.docx out/frameset/manual.html
---
All three questions relate to the editor's "Import DOCX" feature:
*Question 1:*
Is it possible to output STYLED XHTML headings, for example <h1> instead
of <p class="p-Heading1"> ?
I have tried adding the following parameters in the options file with no
success.
-p headings.convert ""
-p headings.convert yes
-p edit.blocks.convert "p-Heading1 h1 ! p-Heading2 h2 ! p-Heading3 h3 !
p-Heading4 h4 ! p-Heading5 h5"
All these parameters do not apply to the "xhtml_css" (XHTML 1.0
Transitional + CSS) conversion. They apply to the semantic XML
--including various flavors of XHTML-- conversions. See
http://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#edit_step
--> Yes, this is possible using:
---
../../bin/w2x -v \
-p convert.charset UTF-8 \
-pu edit.before.finish-styles p2h.xed \
-p split.split-before-level 1 \
-p split.use-id-as-filename yes \
-o frameset manual.docx out/frameset/manual.html
---
Notice:
-pu edit.before.finish-styles p2h.xed
where attached XED script "p2h.xed" is:
---
namespace "http://www.w3.org/1999/xhtml";
namespace html = "http://www.w3.org/1999/xhtml";
for-each /html/body//p[get-class("^p-Heading\d$")] {
set-variable("level",
substring-after(get-class("^p-Heading\d$"),
"p-Heading"));
if ($level >= 1 and $level <= 5) {
set-element-name(concat("h", $level));
}
}
---
XED reference: http://www.xmlmind.com/w2x/_distrib/doc/xedscript/index.html
This certainly works, but this also breaks how headings are
automatically numbered using CSS counters. For example, you get "0.1.3
The general case" instead of expected "6.2.3 The general case".
This is caused by the fact that our code does not expect our customers
to mix semantic tags (h1, h2, ..., h5, ol, ul, li, etc) with the bunch
of cleanly styled paragraphs generated by the "xhtml_css" conversion.
[[[Note for the developer:
The "Split" step expects:
<p class="p-Heading3 n-1-2"
id="general_customize_semantic_xml"><?counter label="6.2.3" name="n-1-2"
value="3"?>The general case</p>
NOT:
<h3 class="p-Heading3 n-1-2"
id="general_customize_semantic_xml"><?counter label="6.2.3" name="n-1-2"
value="3"?>The general case</h3>
in SplitStep#collectCounters]]]
*Question 2:*
Is it possible to generate a styled XHTML frameset without splitting the
output files at all? The output would be just the TOC in the left frame
and the complete content in the right frame.
No.
-p split.split-before-level 0
currently specifies the largest chunks you can get. See
http://www.xmlmind.com/w2x/_distrib/doc/manual/index.html#split_step
*Question 3:*
When converting docx files that contain images stored as VML shapes
(docs that have been converted from on older version of word), "alt
text" on the VML images is not recognized. Instead of using the
v:shape's alt attribute, the convertor is using the v:imagedata o:title
attribute.
For example, this produces an alt attribute of "figure-2".
<v:shape alt="CORRECT ALT TEXT"><v:imagedata o:title="figure-2"/>
Is this configurable, or can it be fixed?I know the workaround is to
"convert" the file so it's not in "compatibility mode" in Word, but this
is not a practical solution in my case.
This looks like a bug. We'll check this one and do our best to fix it in
next release of W2X.
namespace "http://www.w3.org/1999/xhtml";
namespace html = "http://www.w3.org/1999/xhtml";
for-each /html/body//p[get-class("^p-Heading\d$")] {
set-variable("level",
substring-after(get-class("^p-Heading\d$"), "p-Heading"));
if ($level >= 1 and $level <= 5) {
set-element-name(concat("h", $level));
}
}
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support