Den 2019-03-15 kl. 13:51, skrev BPJ:
Den 2019-03-15 kl. 12:31, skrev Zdenek Wagner:
I am also interested how you do it. I have tried with one of my
documents (I do not need this conversion, it was just a test). The
document contains 5 tables and 50 math equations. The first equation is OK, the remainng equations are total garbage, they will have to be entered manually from scratch. The tables are total garbage as well,
they even do not look like tables. The table of contents is garbage
but this is not a major issue. The problem is that in the middle of
the first page, probably as an effect of math, the text becomes
garbage as well. In this situation copy&paste and manual conversion
will be faster unless there is a special (hidden) trick which I do not
know.

As I said in my howto just posted your best bet if you have the original LaTeX file is to redefine commands etc. in *TeX so that the results become less garbagey and easier to correct by hand. I don't know about math because I don't do math, so for me not-so-simple tables are the biggest problem.  If you or anyone else comes up with a *TeX hack which makes column boundaries "visible", as in inserting pipe characters or some such, it will be much easier to tidy things up after conversion to a text format with Pandoc.

You may also want to try Pandoc's direct LaTeX-->Anything conversion, although it is rather lossy for more advanced stuff
it does lists, tables, small caps and surely math quite OK.

I only use this PDF-->DOCX trick for PDFs I get from my clients where the source is not included or may not exist.
I'm still to encounter a client handing me a *TeX file... :-(

BTW you can "tame" recalcitrant LaTeX commands when converting with Pandoc by including `\renewcommand`s restating them in terms of simpler LaTeX constructs which Pandoc can handle and Pandoc will use them. IIRC there is a feature request out (or I'll make one!) for getting some/all LaTeX commands "unknown" to Pandoc as Pandoc's native Div/Span syntax with `custom-style` attributes which you then could hook into when converting to DOCX with Pandoc. It will probably become reality sooner than later. The problem is how to handle commands/environments with multiple arguments (which argument is "the" text?) You can already have Pandoc preserve "unknown" LaTeX as raw LaTeX, which you then can use a Pandoc filter (written in Lua, Python, Perl, your language of choice) to massage into something suitable UTF-8 is no problem as Pandoc uses it natively. It also understands standard babel/polyglossia commands, giving you a native span or div with a `lang` attribute which it then understands to handle correctly when converting to other formats. There are some warts like `\textgreek` giving `lang="el"` rather than "grc" but that can be fixed with a Pandoc filter. DOCX's (lack of) math capabilities may be another story though, but Pandoc surely does its best.

/bpj

Reply via email to