On 2017-03-07, Enrico Forestieri wrote: > The attached patch fixes the regression introduced in 2.2 about the > output of en- and em-dashes. ... > With this patch, documents produced with older versions work again > as intended
Not always: The proposed patch restores the previous behaviour for the subset of pre-2.2 documents that used ligature dashes (see Details below). OTOH, the patch will lead to changed output for older documents using literal EM DASH and EN DASH characters (that were not affected by the changes in 2.2) as well as 2.2 documents. If we are going this way, I propose to make the "ligature dash" output an opt-in. Otherwise, we replace one evil by another - the LyX update again causing unwanted changes for existing documents. More problems with the proposed patch: a) the setting is lost when converting to 2.2 b) There are older documents using literal dash + ZWSP (U+200b) for dash with optional line break point (https://marc.info/?l=lyx-users&m=140982011101908&w=2) With the attached patch, the ZWSP will be removed leading to an unwelcome surprise. Details ======= In versions < 2.2, there were two methods to input and store em- and en-dashes: a) ligature input (--- and --) b) literal EM DASH and EN DASH characters (0x2014 and 0x2013). While both methods produce the same characters in the output, they behave differently regarding possible hyphenation of the preceding word and line break after the dash. Depending on the use case, both methods have advantages and problems (see http://www.lyx.org/trac/raw-attachment/ticket/10543/dash-problems.lyx). Conversion from 2.1 to 2.2 eliminates the difference between ligature dashes and literal dashes. In 2.2, dashes are always stored as EM DASH and EN DASH characters. So, we have 2 problems: a) Changed LaTeX export for documents using ligature dashes leading to different output in some cases. b) Loss of information during the conversion process. Alternatives ============ Further information loss (problem b) can be avoided with a change to Text.cpp which ensures the distinction of ligature vs. literal dash is kept during the conversion. Of course, this cannot restore lost information if converted documents are already modified and saved with 2.2. --- a/src/Text.cpp +++ b/src/Text.cpp @@ -506,9 +506,11 @@ void Text::readParToken(Paragraph & par, Lexer & lex, par.insert(par.size(), from_ascii("---"), font, change); } else { if (token == "\\twohyphens") - par.insertChar(par.size(), 0x2013, font, change); - else - par.insertChar(par.size(), 0x2014, font, change); + par.insertChar(par.size(), 0x2013, font, change); // EN DASH + else { + par.insertChar(par.size(), 0x2014, font, change); // EM DASH + par.insertChar(par.size(), 0x200b, font, change); // ZWSP + } } } else if (token == "\\backslash") { par.appendChar('\\', font, change); Alternatively to a buffer setting, we could also take up the suggestion to define the "ligature dashes" as "special characters": +1 similar to current support for typographical quotes (special char parallel to literal Unicode) +1 enables use of ligature dashes and literal dashes in one document +1 lyx2lyx conversion of 2.1 and 2.2 documents without behaviour change: * replace \twohyphens and \threehyphens with dash-special-chars in Text.cpp. * keep literal dashes. -1 two competing ways to represent dashes Günter