stefano franchi <stefano.fran...@gmail.com> writes: > Hi Prannoy, > > Welcome to LyX!
Welcome Prannoy. > > I am happy to hear you found Lyx interesting and would like to > contribute to our project. > Let me me remind you, though, that Google has not announced which > organization it will accept this year (the announcement will be made > on Feb 24, i.e. tomorrow). We are hopeful we will be selected > again, but there is no certainty. > > That being said, if you want to get a head start I would encourage you to > start getting > familiar with LyX's code base. A good starting point is our bug > tracker [2]. Several bugs are marked "easyfix" and provide excellent > entry points to begin working on the code. Developers documentation is > available on our wiki as well at [3]. Ask question on the developers > list on how to proceed and be sure to check out the beginner > developers FAQ [3]. I have nothing to add here and second everything Stefano said. > > For the LyX<-->Word round-trip conversion project, check out this thread on > the devel > list as a starting point: > https://www.mail-archive.com/lyx-devel%40lists.lyx.org/msg182083.html > The main goals of the project are discussed there. Notice how the main goal > of the conversion (either way) is the preservation of a document's > "semantic" information, not its > formatting. > > Thus, the first design choice is a careful definition of what counts as > "semantic" information in a generic LyX (and Word) document. The bullet > points in the project page provides a first defintion. This list should be > formalized into its LyX and Word's formal counterparts (I.e. Lyx's > parargaph environments and charater styles, and, similarly, styles of > either kind for Word). Most likely, it would be best to create a simple, > special Latex class/LyX layout that includes all and only the allowed > styles, and, similarly, a Word template that includes all and only the > allowed styles. > > Rob Oakes has been working on a Word-->Lyx converter in python [5] which > you may want to check out as well. > > From a technical point of view, two early design choices are: > > 1. whether to start the conversion from the LyX format or from the LaTeX > format that Lyx > can output. > > This is a really tricky issue. On the one hand, working from LyX is much > simpler, as we have direct access to the parsed > data, or we can leverage other tools that parse LyX's file fomat (e.g. > eLyxer). On the other hand, some crucially important > information is actually absent from LyX and is actually *produced* by > LaTex. Bibliographic references are the most important > example in this class: a fully "semantically" formatted reference is > absent from LyX. It is bibtex|biblatex + LaTex that actually produce > the data. Index information are probably in this category too. > The difficult problem is how to extract information from LaTeX's output. > There is an existing project, tex4ht [6], > that pursues this approach. The project is not actively developed now, > due to the untimely death of its founder, > but it is still available, and it actually works. tex4ht runs latex with > a special style which inserts parsing > commands into LaTeX's DVI output. A java program then parses the special > DVI output and produces > html or ODF output. This approach allows tex4ht to exploit Latex's own > processing (including the > processing of index and bibliographic information), at the cost of > increased complexity. > One possibility would be to follow tex4ht's approach, while simplifying > as much as possible the kind of > LaTeX information actively supported. > > One important drawback of this second strategy (LyX-->LaTeX-->Word|ODF) > is that LyX's only information > are lost when converting to LaTeX. The most important of those are > tracked changes. Standard LaTex has no > conception of tracked changes. There are LaTeX additional packages that > manage changes (e.g. [7]), > and we would have to convert LyX's changes into that format. This of > course adds an additional dependency, > unless the package functionalities are somehow replicated by us. I would like to add, that there is a third option (and I am not talking about the xhtml), i.e. using the LyX format for what it can do, and ue the LaTeX document to supplement the one created from LyX. By doing this, one can use the simple approach for basic features, and then supplement it with the LaTeX route when needed (I guwess that the most important aspect is the bibliography). > > 2. Whether to target Microsoft's Word XML format or the Open Document > Format (similarly XML-based) I would strongly argue for the Microsoft Word XML, as each conversion creates problems and inconsistencies. This said, if the conversion from MS Word XML to ODF and back can be done without causing problems in the roundtrip (i.e. the round-trip would then be lyx - ODF XML - MS XML - ODF XML - lyx)I would argue for the more "open" format which can be used on more Operating systems. But there is also the (often raised) question of the move from the lyx format to a XML based format... I have no idea at which stage the new format is, but one should keep this likely change in .lyx format in mind. > > You may want to start learning about both formats. I haven't looked into > either in any depth yet, > but my first impression is that Microsoft's is more complex. > > > > Feel free to ask more questions! > > Cheers, Rainer > > Cheers, > > Stefano > > > [1] http://wiki.lyx.org/GSoC/GSoCProjectIdeasFor2014 > [2] http://www.lyx.org/trac/ > [3] http://www.lyx.org/DevFAQ > [4] http://www.lyx.org/trac/search?q=advanced+find > [5] http://blog.oak-tree.us/index.php/2012/03/08/word2lyx01-2 > [6] https://www.tug.org/applications/tex4ht/mn.html > [7] http://texdoc.net/texmf-dist/doc/latex/changes/changes.english.pdf > > > On Sun, Feb 23, 2014 at 6:17 AM, Prannoy Pilligundla <prannoy.b...@gmail.com >> wrote: > >> Hi Everyone, >> >> I am Prannoy Pilligundla pursuing undergraduation in BITS-Pilani,India.I >> am proficient in C,Java,Python and RoR. Here is the link to my bitbucket >> profile https://bitbucket.org/prannoy1994 >> >> I had a look at 2014 ideas page( >> http://wiki.lyx.org/GSoC/GSoCProjectIdeasFor2014) and i am interested to >> work on Round trip conversion between LyX and .docx formats. It would be >> great if someone can guide me on how to start work on this.I want to get >> accustomed to the existing code base and start contributing before writing >> my application for GSoC 2014 >> >> Thanks and Regards >> Prannoy Pilligundla >> ᐧ >> Welcome to LyX! I am happy to hear you found Lyx interesting and would like to contribute to our project. Let me me remind you, though, that Google has not announced which organization it will accept this year (the announcement will be made on Feb 24, i.e. tomorrow). We are hopeful we will be selected again, but there is no certainty. That being said, if you want to get a head start I would encourage you to start getting familiar with LyX's code base. A good starting point is our bug tracker [2]. Several bugs are marked "easyfix" and provide excellent entry points to begin working on the code. Developers documentation is available on our wiki as well at [3]. Ask question on the developers list on how to proceed and be sure to check out the beginner developers FAQ [3]. For the LyX<-->Word round-trip conversion project, check out this thread on the devel list as a starting point: https://www.mail-archive.com/lyx-devel%40lists.lyx.org/msg182083.html The main goals of the project are discussed there. Notice how the main goal of the conversion (either way) is the preservation of a document's "semantic" information, not its formatting. Thus, the first design choice is a careful definition of what counts as "semantic" information in a generic LyX (and Word) document. The bullet points in the project page provides a first defintion. This list should be formalized into its LyX and Word's formal counterparts (I.e. Lyx's parargaph environments and charater styles, and, similarly, styles of either kind for Word). Most likely, it would be best to create a simple, special Latex class/LyX layout that includes all and only the allowed styles, and, similarly, a Word template that includes all and only the allowed styles. Rob Oakes has been working on a Word-->Lyx converter in python [5] which you may want to check out as well. From a technical point of view, two early design choices are: 1. whether to start the conversion from the LyX format or from the LaTeX format that Lyx can output. This is a really tricky issue. On the one hand, working from LyX is much simpler, as we have direct access to the parsed data, or we can leverage other tools that parse LyX's file fomat (e.g. eLyxer). On the other hand, some crucially important information is actually absent from LyX and is actually *produced* by LaTex. Bibliographic references are the most important example in this class: a fully "semantically" formatted reference is absent from LyX. It is bibtex|biblatex + LaTex that actually produce the data. Index information are probably in this category too. The difficult problem is how to extract information from LaTeX's output. There is an existing project, tex4ht [6], that pursues this approach. The project is not actively developed now, due to the untimely death of its founder, but it is still available, and it actually works. tex4ht runs latex with a special style which inserts parsing commands into LaTeX's DVI output. A java program then parses the special DVI output and produces html or ODF output. This approach allows tex4ht to exploit Latex's own processing (including the processing of index and bibliographic information), at the cost of increased complexity. One possibility would be to follow tex4ht's approach, while simplifying as much as possible the kind of LaTeX information actively supported. One important drawback of this second strategy (LyX-->LaTeX-->Word|ODF) is that LyX's only information are lost when converting to LaTeX. The most important of those are tracked changes. Standard LaTex has no conception of tracked changes. There are LaTeX additional packages that manage changes (e.g. [7]), and we would have to convert LyX's changes into that format. This of course adds an additional dependency, unless the package functionalities are somehow replicated by us. 2. Whether to target Microsoft's Word XML format or the Open Document -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug
pgp97EUI2H0Kl.pgp
Description: PGP signature