Re: XeLaTeX to Word/OpenOffice - the state of the art?

BPJ Fri, 15 Mar 2019 06:11:49 -0700

Den 2019-03-15 kl. 13:51, skrev BPJ:

Den 2019-03-15 kl. 12:31, skrev Zdenek Wagner:
I am also interested how you do it. I have tried with one of my
documents (I do not need this conversion, it was just a test). The
document contains 5 tables and 50 math equations. The firstequationis OK, the remainng equations are total garbage, they will haveto beentered manually from scratch. The tables are total garbage aswell,
they even do not look like tables. The table of contents is garbage
but this is not a major issue. The problem is that in the middle of
the first page, probably as an effect of math, the text becomes
garbage as well. In this situation copy&paste and manual conversion
will be faster unless there is a special (hidden) trick which Ido not
know.
As I said in my howto just posted your best bet if you have theoriginal LaTeX file is to redefine commands etc. in *TeX so thatthe results become less garbagey and easier to correct by hand.I don't know about math because I don't do math, so for menot-so-simple tables are the biggest problem. If you or anyoneelse comes up with a *TeX hack which makes column boundaries"visible",as in inserting pipe characters or some such, it will be mucheasier to tidy things up after conversion to a text format withPandoc.
You may also want to try Pandoc's direct LaTeX-->Anythingconversion, although it is rather lossy for more advanced stuff
it does lists, tables, small caps and surely math quite OK.
I only use this PDF-->DOCX trick for PDFs I get from my clientswhere the source is not included or may not exist.
I'm still to encounter a client handing me a *TeX file... :-(

BTW you can "tame" recalcitrant LaTeX commands when convertingwith Pandoc by including `\renewcommand`s restating them in termsof simpler LaTeX constructs which Pandoc can handle and Pandocwill use them. IIRC there is a feature request out (or I'll makeone!) for getting some/all LaTeX commands "unknown" to Pandoc asPandoc's native Div/Span syntax with `custom-style` attributeswhich you then could hook into when converting to DOCX withPandoc. It will probably become reality sooner than later. Theproblem is how to handle commands/environments with multiplearguments (which argument is "the" text?) You can already havePandoc preserve "unknown" LaTeX as raw LaTeX, which you then canuse a Pandoc filter (written in Lua, Python, Perl, your languageof choice) to massage into something suitable UTF-8 is no problemas Pandoc uses it natively. It also understands standardbabel/polyglossia commands, giving you a native span or div with a`lang` attribute which it then understands to handle correctlywhen converting to other formats. There are some warts like`\textgreek` giving `lang="el"` rather than "grc" but that can befixed with a Pandoc filter. DOCX's (lack of) math capabilitiesmay be another story though, but Pandoc surely does its best.


/bpj

Re: XeLaTeX to Word/OpenOffice - the state of the art?

Reply via email to