Re: [GSoC 2014]Interested in Round trip conversion between LyX and .docx formats

stefano franchi Sun, 23 Feb 2014 07:54:06 -0800

Hi Prannoy,

Welcome to LyX!

I am happy to hear  you found Lyx interesting and would like to
contribute to our project.
Let me me remind you, though, that Google has not announced which
organization it will accept this year (the announcement will be made
on Feb 24, i.e. tomorrow). We are hopeful we will be selected
again, but there is no certainty.

That being said, if you want to get a head start I would encourage you to
start getting
familiar with LyX's code base. A good starting point is our bug
tracker [2]. Several bugs are marked "easyfix" and provide excellent
entry points to begin working on the code. Developers documentation is
available on our wiki as well at [3]. Ask question on the developers
list on how to proceed and be sure to check out the beginner
developers FAQ [3].

For the LyX<-->Word round-trip conversion project, check out this thread on
the devel
list as a starting point:
https://www.mail-archive.com/lyx-devel%40lists.lyx.org/msg182083.html
The main goals of the project are discussed there. Notice how the main goal
of the conversion (either way) is the preservation of a document's
"semantic" information, not its
formatting.

Thus, the first design choice is a careful definition of what counts as
"semantic" information in a generic LyX (and Word) document. The bullet
points in the project page provides a first defintion. This list should be
formalized into its LyX and Word's formal counterparts (I.e. Lyx's
parargaph environments and charater styles, and, similarly, styles of
either kind for Word). Most likely, it would be best to create a simple,
special Latex class/LyX layout that includes all and only the allowed
styles, and, similarly, a Word template that includes all and only the
allowed styles.

Rob Oakes has been working on a Word-->Lyx converter in python [5] which
you may want to check out as well.

>From a technical point of view, two early design choices are:

1. whether to start the conversion from the LyX format or from the LaTeX
format that Lyx
    can output.

   This is a really tricky issue. On the one hand, working from LyX is much
simpler, as we have direct access to the parsed
   data, or we can leverage other tools that parse LyX's file fomat (e.g.
eLyxer). On the other hand, some crucially important
   information is actually absent from LyX and is actually *produced* by
LaTex. Bibliographic references are the most important
   example in this class: a fully "semantically" formatted reference is
absent from LyX. It is bibtex|biblatex + LaTex that actually produce
   the data. Index information are probably in this category too.
   The difficult problem is how to extract information from LaTeX's output.
There is an existing project, tex4ht  [6],
   that pursues this approach.  The project is not actively developed now,
due to the untimely death of its founder,
   but it is still available, and it actually works. tex4ht runs latex with
a special style which inserts parsing
   commands into LaTeX's DVI output. A java program then parses the special
DVI output and produces
   html or ODF output. This approach allows tex4ht to exploit Latex's own
processing (including the
   processing of index and bibliographic information), at the cost of
increased complexity.
   One possibility would be to follow tex4ht's approach, while simplifying
as much as possible the kind of
   LaTeX information actively supported.

  One important drawback of this second strategy (LyX-->LaTeX-->Word|ODF)
is that LyX's only information
  are lost when converting to LaTeX. The most important of those are
tracked changes. Standard LaTex has no
  conception of tracked changes. There are LaTeX additional packages that
manage changes (e.g. [7]),
  and we would have to convert LyX's changes into that format. This of
course adds an additional dependency,
  unless the package functionalities are somehow replicated by us.

2. Whether to target Microsoft's Word XML format or the Open Document
Format (similarly XML-based)

  You may want to start learning about both formats. I haven't looked into
either in any depth yet,
  but my first impression is that Microsoft's is  more complex.

Feel free to ask more questions!

Cheers,

Stefano

[1] http://wiki.lyx.org/GSoC/GSoCProjectIdeasFor2014
[2] http://www.lyx.org/trac/
[3] http://www.lyx.org/DevFAQ
[4] http://www.lyx.org/trac/search?q=advanced+find
[5] http://blog.oak-tree.us/index.php/2012/03/08/word2lyx01-2
[6] https://www.tug.org/applications/tex4ht/mn.html
[7] http://texdoc.net/texmf-dist/doc/latex/changes/changes.english.pdf

On Sun, Feb 23, 2014 at 6:17 AM, Prannoy Pilligundla <prannoy.b...@gmail.com
> wrote:

> Hi Everyone,
>
> I am Prannoy Pilligundla pursuing undergraduation in BITS-Pilani,India.I
> am proficient in C,Java,Python and RoR. Here is the link to my bitbucket
> profile https://bitbucket.org/prannoy1994
>
> I had a look at 2014 ideas page(
> http://wiki.lyx.org/GSoC/GSoCProjectIdeasFor2014) and i am interested to
> work on Round trip conversion between LyX and .docx formats. It would be
> great if someone can guide me on how to start work on this.I want to get
> accustomed to the existing code base and start contributing before writing
> my application for GSoC 2014
>
> Thanks and Regards
> Prannoy Pilligundla
> ᐧ
>

-- 
__________________________________________________
Stefano Franchi
Associate Research Professor
Department of Hispanic Studies         Ph:   +1 (979) 845-2125
Texas A&M University                          Fax:  +1 (979) 845-6421
College Station, Texas, USA

stef...@tamu.edu
http://stefano.cleinias.org

Re: [GSoC 2014]Interested in Round trip conversion between LyX and .docx formats

Reply via email to