On 2017-03-07, Enrico Forestieri wrote:

> The attached patch fixes the regression introduced in 2.2 about the
> output of en- and em-dashes. 
...
> With this patch, documents produced with older versions work again
> as intended 

Not always:

The proposed patch restores the previous behaviour for the subset of
pre-2.2 documents that used ligature dashes (see Details below).

OTOH, the patch will lead to changed output for older documents using
literal EM DASH and EN DASH characters (that were not affected by the
changes in 2.2) as well as 2.2 documents.

If we are going this way, I propose to make the "ligature dash" output an
opt-in. Otherwise, we replace one evil by another - the LyX update again
causing unwanted changes for existing documents.


More problems with the proposed patch:

a) the setting is lost when converting to 2.2

b) There are older documents using literal dash + ZWSP (U+200b)
   for dash with optional line break point
   (https://marc.info/?l=lyx-users&m=140982011101908&w=2)
   With the attached patch, the ZWSP will be removed leading to an unwelcome
   surprise. 



Details
=======

In versions < 2.2, there were two methods to input and store em- and
en-dashes:

a) ligature input (--- and --)
b) literal EM DASH and EN DASH characters (0x2014 and 0x2013).

While both methods produce the same characters in the output, they behave
differently regarding possible hyphenation of the preceding word and line
break after the dash. Depending on the use case, both methods have
advantages and problems (see
http://www.lyx.org/trac/raw-attachment/ticket/10543/dash-problems.lyx).

Conversion from 2.1 to 2.2 eliminates the difference between ligature dashes
and literal dashes. In 2.2, dashes are always stored as EM DASH and EN DASH
characters.

So, we have 2 problems:

a) Changed LaTeX export for documents using ligature dashes leading to
   different output in some cases.
   
b) Loss of information during the conversion process.


Alternatives
============

Further information loss (problem b) can be avoided with a change to
Text.cpp which ensures the distinction of ligature vs. literal dash is kept
during the conversion. Of course, this cannot restore lost information if
converted documents are already modified and saved with 2.2.

--- a/src/Text.cpp
+++ b/src/Text.cpp
@@ -506,9 +506,11 @@ void Text::readParToken(Paragraph & par, Lexer & lex,
                                par.insert(par.size(), from_ascii("---"), font, 
change);
                } else {
                        if (token == "\\twohyphens")
-                               par.insertChar(par.size(), 0x2013, font, 
change);
-                       else
-                               par.insertChar(par.size(), 0x2014, font, 
change);
+                               par.insertChar(par.size(), 0x2013, font, 
change); // EN DASH
+                       else {
+                               par.insertChar(par.size(), 0x2014, font, 
change); // EM DASH
+                               par.insertChar(par.size(), 0x200b, font, 
change); // ZWSP
+                       }
                }
        } else if (token == "\\backslash") {
                par.appendChar('\\', font, change);



Alternatively to a buffer setting, we could also take up the suggestion to
define the "ligature dashes" as "special characters":

+1 similar to current support for typographical quotes (special char
   parallel to literal Unicode)
   
+1 enables use of ligature dashes and literal dashes in one document

+1 lyx2lyx conversion of 2.1 and 2.2 documents without behaviour change:
   * replace \twohyphens and \threehyphens with
     dash-special-chars in Text.cpp.
   * keep literal dashes.
   
-1 two competing ways to represent dashes   


Günter

Reply via email to