Hi Sean,

       thanks for sharing your work with the list. As you may know already
from some recent exchanges on the devel list, we are exploring some options
that we hope will eventually take us to the promised land of a LyX<-->Word
roundtrip converter. I have started some ---very---preliminary work in the
area in the last few weeks, mostly trying to plot a feasible strategy
around a rather complex problem.
I haven't done any concrete coding in the area, but my favorite  strategy
for the Word-->LyX step would tend to lean toward a direct parsing of the
XML format produced either by Word (the docx format) or by
{Open|Libre}Office (the odt format), possibly directly into LyX (avoiding
the LaTeX intermediate step).
I would be very interested in knowing your thoughts on the issue and the
reasons that prompted you to choose a rtf2latex+perl solution.


Cheers,

Stefano




On Sat, Apr 5, 2014 at 3:28 PM, Sean Patrick Burke <seanb...@gmail.com>wrote:

> Hi everybody.
>
> I know we’ve been down this road a bunch of times but I think a Word
> (doc/docx) converter to lyx would be really useful. I’ve written one in
> perl and attached it for the list to tear apart.  If the group likes it,
> maybe we can stick in a future version. Continue on for details.
>
> Dependencies
> The script relies on LibreOffice and rtf2latex2e. I’ve fatpacked the
> script so it should run on any default perl installation. I’ve tested it on
> linux (Ubuntu13.10) and OS X Mavericks. I don’t have access to a Windows
> box, but obviously that would be a useful extension.
>
>
> Usage
> At the command line:
> word2lyx.pl —in myfile.doc —out yourfile.lyx [—verbose —debug —ugly]
>
> In lyx:
> Set up a standard converter (“MS Word” and “Lyx”). The call is:
> perl $$s/word2lyx.pl --in $$i --out $$o
>
> Note: “MS Word” only looks for .doc files as a default, but the script
> works with both .docx and .doc.
>
> How to Convert Most Word Files to Lyx
> This is convoluted, but it works in most cases. In brief:
>
> (Docx|Doc) -> Word 97 Doc -> Fix LibreOffice Footnote Bug -> RTF -> Latex
> -> More clean-up -> Lyx
>
> rtf2latex2e seems to choke on modern Word files, so regardless of the
> input you need to convert the document to a Word 97 doc. If you go from
> input straight to rtf, rtf2latex2e will melt down in some cases.
>
> Generically, the commands in the chain are:
>
> Modern Doc -> Paleo Doc
> soffice --headless --convert-to doc:"MS Word 97" --outdir /tmp my_input_doc
>
> Paleo Doc -> RTF
> soffice --headless --convert-to rtf:"Rich Text Format" --outdir /tmp
> my_paleo_doc
>
> At this point, we have to go in and correct an unreported footnote bug in
> the LibreOffice’s RTF converter. It’s not worth going into here. Email me
> if you want details.
>
> RTF->Latex
> rtf2latex2e -n -p 33 -t 12 my_rtf
>
> Clean up step. For some reason tex2lyx chokes on an unescaped ampersand
> (the kind that are generated when one is inserted in a Word file). I also
> usually cut out the well meaning (but ugly) custom spacing. (You can turn
> this off by adding —ugly to the above command line.)
>
> Latex->Lyx
> tex2lyx my_tex
>
> Anyway, I’ve tested it on a dozen or so varying word files on Lyx 2.0.6.
> The output isn’t pretty, but footnotes and basic formatting is usually
> preserved.
>
> I’d love input if any is available.
>
> Attachments: word2lyx.pl  (without modules included),
> word2lyx.packed.0.99.pl (with modules).
>
> Best,
>
> Sean Burke
>



-- 
__________________________________________________
Stefano Franchi
Associate Research Professor
Department of Hispanic Studies         Ph:   +1 (979) 845-2125
Texas A&M University                          Fax:  +1 (979) 845-6421
College Station, Texas, USA

stef...@tamu.edu
http://stefano.cleinias.org

Reply via email to