Hi Prannoy, Welcome to LyX!
I am happy to hear you found Lyx interesting and would like to contribute to our project. Let me me remind you, though, that Google has not announced which organization it will accept this year (the announcement will be made on Feb 24, i.e. tomorrow). We are hopeful we will be selected again, but there is no certainty. That being said, if you want to get a head start I would encourage you to start getting familiar with LyX's code base. A good starting point is our bug tracker [2]. Several bugs are marked "easyfix" and provide excellent entry points to begin working on the code. Developers documentation is available on our wiki as well at [3]. Ask question on the developers list on how to proceed and be sure to check out the beginner developers FAQ [3]. For the LyX<-->Word round-trip conversion project, check out this thread on the devel list as a starting point: https://www.mail-archive.com/lyx-devel%40lists.lyx.org/msg182083.html The main goals of the project are discussed there. Notice how the main goal of the conversion (either way) is the preservation of a document's "semantic" information, not its formatting. Thus, the first design choice is a careful definition of what counts as "semantic" information in a generic LyX (and Word) document. The bullet points in the project page provides a first defintion. This list should be formalized into its LyX and Word's formal counterparts (I.e. Lyx's parargaph environments and charater styles, and, similarly, styles of either kind for Word). Most likely, it would be best to create a simple, special Latex class/LyX layout that includes all and only the allowed styles, and, similarly, a Word template that includes all and only the allowed styles. Rob Oakes has been working on a Word-->Lyx converter in python [5] which you may want to check out as well. >From a technical point of view, two early design choices are: 1. whether to start the conversion from the LyX format or from the LaTeX format that Lyx can output. This is a really tricky issue. On the one hand, working from LyX is much simpler, as we have direct access to the parsed data, or we can leverage other tools that parse LyX's file fomat (e.g. eLyxer). On the other hand, some crucially important information is actually absent from LyX and is actually *produced* by LaTex. Bibliographic references are the most important example in this class: a fully "semantically" formatted reference is absent from LyX. It is bibtex|biblatex + LaTex that actually produce the data. Index information are probably in this category too. The difficult problem is how to extract information from LaTeX's output. There is an existing project, tex4ht [6], that pursues this approach. The project is not actively developed now, due to the untimely death of its founder, but it is still available, and it actually works. tex4ht runs latex with a special style which inserts parsing commands into LaTeX's DVI output. A java program then parses the special DVI output and produces html or ODF output. This approach allows tex4ht to exploit Latex's own processing (including the processing of index and bibliographic information), at the cost of increased complexity. One possibility would be to follow tex4ht's approach, while simplifying as much as possible the kind of LaTeX information actively supported. One important drawback of this second strategy (LyX-->LaTeX-->Word|ODF) is that LyX's only information are lost when converting to LaTeX. The most important of those are tracked changes. Standard LaTex has no conception of tracked changes. There are LaTeX additional packages that manage changes (e.g. [7]), and we would have to convert LyX's changes into that format. This of course adds an additional dependency, unless the package functionalities are somehow replicated by us. 2. Whether to target Microsoft's Word XML format or the Open Document Format (similarly XML-based) You may want to start learning about both formats. I haven't looked into either in any depth yet, but my first impression is that Microsoft's is more complex. Feel free to ask more questions! Cheers, Stefano [1] http://wiki.lyx.org/GSoC/GSoCProjectIdeasFor2014 [2] http://www.lyx.org/trac/ [3] http://www.lyx.org/DevFAQ [4] http://www.lyx.org/trac/search?q=advanced+find [5] http://blog.oak-tree.us/index.php/2012/03/08/word2lyx01-2 [6] https://www.tug.org/applications/tex4ht/mn.html [7] http://texdoc.net/texmf-dist/doc/latex/changes/changes.english.pdf On Sun, Feb 23, 2014 at 6:17 AM, Prannoy Pilligundla <prannoy.b...@gmail.com > wrote: > Hi Everyone, > > I am Prannoy Pilligundla pursuing undergraduation in BITS-Pilani,India.I > am proficient in C,Java,Python and RoR. Here is the link to my bitbucket > profile https://bitbucket.org/prannoy1994 > > I had a look at 2014 ideas page( > http://wiki.lyx.org/GSoC/GSoCProjectIdeasFor2014) and i am interested to > work on Round trip conversion between LyX and .docx formats. It would be > great if someone can guide me on how to start work on this.I want to get > accustomed to the existing code base and start contributing before writing > my application for GSoC 2014 > > Thanks and Regards > Prannoy Pilligundla > ᐧ > -- __________________________________________________ Stefano Franchi Associate Research Professor Department of Hispanic Studies Ph: +1 (979) 845-2125 Texas A&M University Fax: +1 (979) 845-6421 College Station, Texas, USA stef...@tamu.edu http://stefano.cleinias.org