The attached patch fixes the regression introduced in 2.2 about the
output of en- and em-dashes. In 2.2 en- and em-dashes are output as
the \textendash and \textemdash macros, causing changed output in
old documents and also bugs (for example, #10490).

With this patch, documents produced with older versions work again
as intended, while documents produced with 2.2 can be made to produce
the exact same output by simply checking "Don't use ligatures for en-
and em-dashes" in Document->Settings->Fonts.

Actually, I am attaching two patches. They differ only in the way
documents are exported to earlier versions. If one wants to use
ligatures for en/em-dashes, in order to not cause changed output,
a zero-width space inset is inserted after each en/em-dash when
using the first patch, while the second patch inserts a zero-width
space character (U+200B). Both are removed when reloading  documents
with 2.3, so that they don't accumulate.

The second patch produces more visually pleasant documents, as the
zero-width space character is invisible on screen, but they work
OOTB only when exporting to 2.1 at most. This is because 2.0 and
earlier versions don't define U+200B in the unicodesymbols file.
However it could be manually added there.

-- 
Enrico
diff --git a/development/FORMAT b/development/FORMAT
index 38c6ec1..55dcf9b 100644
--- a/development/FORMAT
+++ b/development/FORMAT
@@ -7,6 +7,12 @@ changes happened in particular if possible. A good example 
would be
 
 -----------------------
 
+2017-03-05 Enrico Forestieri <for...@lyx.org>
+       * Format incremented to 535: support for en/em-dash as ligatures.
+         The en- and em-dashes (U+2013 and U+2014) are now exported as
+         the font ligatures -- and --- unless instructed otherwise by
+         a document preference.
+
 2017-02-04 Jürgen Spitzmüller <sp...@lyx.org>
        * Format incremented to 534: Support for chapterbib
           - New buffer param value \multibib child
diff --git a/lib/lyx2lyx/lyx_2_3.py b/lib/lyx2lyx/lyx_2_3.py
index 9fbe12d..d65fb96 100644
--- a/lib/lyx2lyx/lyx_2_3.py
+++ b/lib/lyx2lyx/lyx_2_3.py
@@ -1840,6 +1840,107 @@ def revert_chapterbib(document):
 
     # 7. Chapterbib proper
     add_to_preamble(document, ["\\usepackage{chapterbib}"])
+
+
+def convert_dashligatures(document):
+    " Remove a zero-length space inset after en- and em-dashes. "
+
+    i = 0
+    while i < len(document.body):
+        words = document.body[i].split()
+        # Skip some document parts where dashes are not converted
+        if len(words) > 1 and words[0] == "\\begin_inset" and \
+           words[1] in ["CommandInset", "ERT", "External", "Formula", \
+                        "FormulaMacro", "Graphics", "IPA", "listings"]:
+            j = find_end_of_inset(document.body, i)
+            if j == -1:
+                document.warning("Malformed LyX document: Can't find end of " \
+                                 + words[1] + " inset at line " + str(i))
+                i += 1
+            else:
+                i = j
+            continue
+        if len(words) > 0 and words[0] in ["\\leftindent", \
+                "\\paragraph_spacing", "\\align", "\\labelwidthstring"]:
+            i += 1
+            continue
+
+        start = 0
+        while True:
+            j = document.body[i].find(u"\u2013", start) # en-dash
+            k = document.body[i].find(u"\u2014", start) # em-dash
+            if j == -1 and k == -1:
+                break
+            if j == -1 or (k != -1 and k < j):
+                j = k
+            after = document.body[i][j+1:]
+            if len(after) > 0:
+                start = j+1
+                continue
+            j = i+1
+            words = document.body[j].split()
+            if len(words) > 2 and words[0] == "\\begin_inset" and \
+               words[1] == "space" and words[2] == "\\hspace{}":
+                l = find_end_of_inset(document.body, j)
+                if l == -1:
+                    document.warning("Malformed LyX document: Can't find end" \
+                                     + " of space inset at line " + str(j))
+                else:
+                    space = get_value(document.body, "\\length", j, l)
+                    if space[:1] == "0":
+                        del document.body[j:l+1]
+                        while document.body[j] == "":
+                            del document.body[j]
+            break
+        i += 1
+
+
+def revert_dashligatures(document):
+    " Remove font ligature settings for en- and em-dashes. "
+    i = find_token(document.header, "\\use_dash_ligatures", 0)
+    if i == -1:
+        return
+    use_dash_ligatures = get_bool_value(document.header, 
"\\use_dash_ligatures" , i)
+    del document.header[i]
+    if not use_dash_ligatures:
+        return
+
+    # Add a zero-length space inset after en- and em-dashes
+    i = 0
+    while i < len(document.body):
+        words = document.body[i].split()
+        # Skip some document parts where dashes are not converted
+        if len(words) > 1 and words[0] == "\\begin_inset" and \
+           words[1] in ["CommandInset", "ERT", "External", "Formula", \
+                        "FormulaMacro", "Graphics", "IPA", "listings"]:
+            j = find_end_of_inset(document.body, i)
+            if j == -1:
+                document.warning("Malformed LyX document: Can't find end of " \
+                                 + words[1] + " inset at line " + str(i))
+                i += 1
+            else:
+                i = j
+            continue
+        if len(words) > 0 and words[0] in ["\\leftindent", \
+                "\\paragraph_spacing", "\\align", "\\labelwidthstring"]:
+            i += 1
+            continue
+
+        while True:
+            j = document.body[i].find(u"\u2013") # en-dash
+            k = document.body[i].find(u"\u2014") # em-dash
+            if j == -1 and k == -1:
+                break
+            if j == -1 or (k != -1 and k < j):
+                j = k
+            after = document.body[i][j+1:]
+            document.body[i] = document.body[i][:j+1]
+            document.body[i+1:i+1] = ["\\begin_inset space \\hspace{}", \
+                                      "\\length 0pt", "\\end_inset", ""]
+            if len(after) > 0:
+                document.body.insert(i+5, after)
+            i += 5
+        i += 1
     
 
 ##
@@ -1873,10 +1974,12 @@ convert = [
            [531, []],
            [532, [convert_literalparam]],
            [533, []],
-           [534, []]
+           [534, []],
+           [535, [convert_dashligatures]]
           ]
 
 revert =  [
+           [534, [revert_dashligatures]],
            [533, [revert_chapterbib]],
            [532, [revert_multibib]],
            [531, [revert_literalparam]],
diff --git a/src/BufferParams.cpp b/src/BufferParams.cpp
index f20078b..03b1bd5 100644
--- a/src/BufferParams.cpp
+++ b/src/BufferParams.cpp
@@ -415,6 +415,7 @@ BufferParams::BufferParams()
        fonts_default_family = "default";
        useNonTeXFonts = false;
        use_microtype = false;
+       use_dash_ligatures = true;
        fonts_expert_sc = false;
        fonts_old_figures = false;
        fonts_sans_scale[0] = 100;
@@ -812,6 +813,8 @@ string BufferParams::readToken(Lexer & lex, string const & 
token,
                lex >> fonts_cjk;
        } else if (token == "\\use_microtype") {
                lex >> use_microtype;
+       } else if (token == "\\use_dash_ligatures") {
+               lex >> use_dash_ligatures;
        } else if (token == "\\paragraph_separation") {
                string parsep;
                lex >> parsep;
@@ -1196,6 +1199,7 @@ void BufferParams::writeFile(ostream & os, Buffer const * 
buf) const
                os << "\\font_cjk " << fonts_cjk << '\n';
        }
        os << "\\use_microtype " << convert<string>(use_microtype) << '\n';
+       os << "\\use_dash_ligatures " << convert<string>(use_dash_ligatures) << 
'\n';
        os << "\\graphics " << graphics_driver << '\n';
        os << "\\default_output_format " << default_output_format << '\n';
        os << "\\output_sync " << output_sync << '\n';
diff --git a/src/BufferParams.h b/src/BufferParams.h
index 200b6d4..30a157e 100644
--- a/src/BufferParams.h
+++ b/src/BufferParams.h
@@ -280,6 +280,8 @@ public:
        std::string fonts_cjk;
        /// use LaTeX microtype package
        bool use_microtype;
+       /// use font ligatures for en- and em-dashes
+       bool use_dash_ligatures;
        ///
        Spacing & spacing();
        Spacing const & spacing() const;
diff --git a/src/Paragraph.cpp b/src/Paragraph.cpp
index eb5b111..24a3c7c 100644
--- a/src/Paragraph.cpp
+++ b/src/Paragraph.cpp
@@ -1274,6 +1274,21 @@ void Paragraph::Private::latexSpecialChar(otexstream & 
os,
                // written. (Asger)
                break;
 
+       case 0x2013:
+       case 0x2014:
+               if (bparams.use_dash_ligatures) {
+                       if (c == 0x2013) {
+                               // en-dash
+                               os << "--";
+                               column +=2;
+                       } else {
+                               // em-dash
+                               os << "---";
+                               column +=3;
+                       }
+                       break;
+               }
+               // fall through
        default:
                if (c == '\0')
                        return;
diff --git a/src/frontends/qt4/GuiDocument.cpp 
b/src/frontends/qt4/GuiDocument.cpp
index 2293ce3..01a5edd 100644
--- a/src/frontends/qt4/GuiDocument.cpp
+++ b/src/frontends/qt4/GuiDocument.cpp
@@ -838,6 +838,8 @@ GuiDocument::GuiDocument(GuiView & lv)
                this, SLOT(change_adaptor()));
        connect(fontModule->microtypeCB, SIGNAL(clicked()),
                this, SLOT(change_adaptor()));
+       connect(fontModule->dashesCB, SIGNAL(clicked()),
+               this, SLOT(change_adaptor()));
        connect(fontModule->scaleSansSB, SIGNAL(valueChanged(int)),
                this, SLOT(change_adaptor()));
        connect(fontModule->scaleTypewriterSB, SIGNAL(valueChanged(int)),
@@ -3046,6 +3048,7 @@ void GuiDocument::applyView()
                fromqstr(fontModule->cjkFontLE->text());
 
        bp_.use_microtype = fontModule->microtypeCB->isChecked();
+       bp_.use_dash_ligatures = !fontModule->dashesCB->isChecked();
 
        bp_.fonts_sans_scale[nontexfonts] = fontModule->scaleSansSB->value();
        bp_.fonts_sans_scale[!nontexfonts] = fontModule->font_sf_scale;
@@ -3550,6 +3553,7 @@ void GuiDocument::paramsToDialog()
                fontModule->cjkFontLE->setText(QString());
        
        fontModule->microtypeCB->setChecked(bp_.use_microtype);
+       fontModule->dashesCB->setChecked(!bp_.use_dash_ligatures);
 
        fontModule->fontScCB->setChecked(bp_.fonts_expert_sc);
        fontModule->fontOsfCB->setChecked(bp_.fonts_old_figures);
diff --git a/src/frontends/qt4/ui/FontUi.ui b/src/frontends/qt4/ui/FontUi.ui
index 416c3fb..e04d5d8 100644
--- a/src/frontends/qt4/ui/FontUi.ui
+++ b/src/frontends/qt4/ui/FontUi.ui
@@ -242,7 +242,27 @@
      </property>
     </widget>
    </item>
+   <item row="10" column="1">
+    <widget class="QCheckBox" name="microtypeCB">
+     <property name="toolTip">
+      <string>Activate extensions such as character protrusion and font 
expansion via the microtype package</string>
+     </property>
+     <property name="text">
+      <string>Enable micr&amp;o-typographic extensions</string>
+     </property>
+    </widget>
+   </item>
    <item row="11" column="1">
+    <widget class="QCheckBox" name="dashesCB">
+     <property name="toolTip">
+      <string>Use \\textendash and \\textemdash instead of -- and --- for en- 
and em-dashes</string>
+     </property>
+     <property name="text">
+      <string>Don't use ligatures for en- and &amp;em-dashes</string>
+     </property>
+    </widget>
+   </item>
+   <item row="12" column="1">
     <spacer name="verticalSpacer">
      <property name="orientation">
       <enum>Qt::Vertical</enum>
@@ -255,16 +275,6 @@
      </property>
     </spacer>
    </item>
-   <item row="10" column="1">
-    <widget class="QCheckBox" name="microtypeCB">
-     <property name="toolTip">
-      <string>Activate extensions such as character protrusion and font 
expansion via the microtype package</string>
-     </property>
-     <property name="text">
-      <string>Enable micr&amp;o-typographic extensions</string>
-     </property>
-    </widget>
-   </item>
   </layout>
  </widget>
  <layoutdefault spacing="6" margin="11"/>
diff --git a/src/version.h b/src/version.h
index 43a9b35..5a03e0d 100644
--- a/src/version.h
+++ b/src/version.h
@@ -32,8 +32,8 @@ extern char const * const lyx_version_info;
 
 // Do not remove the comment below, so we get merge conflict in
 // independent branches. Instead add your own.
-#define LYX_FORMAT_LYX 534 // spitz: chapterbib support
-#define LYX_FORMAT_TEX2LYX 534
+#define LYX_FORMAT_LYX 535 // ef: support for en/em-dash as ligatures
+#define LYX_FORMAT_TEX2LYX 535
 
 #if LYX_FORMAT_TEX2LYX != LYX_FORMAT_LYX
 #ifndef _MSC_VER
diff --git a/development/FORMAT b/development/FORMAT
index 38c6ec1..55dcf9b 100644
--- a/development/FORMAT
+++ b/development/FORMAT
@@ -7,6 +7,12 @@ changes happened in particular if possible. A good example 
would be
 
 -----------------------
 
+2017-03-05 Enrico Forestieri <for...@lyx.org>
+       * Format incremented to 535: support for en/em-dash as ligatures.
+         The en- and em-dashes (U+2013 and U+2014) are now exported as
+         the font ligatures -- and --- unless instructed otherwise by
+         a document preference.
+
 2017-02-04 Jürgen Spitzmüller <sp...@lyx.org>
        * Format incremented to 534: Support for chapterbib
           - New buffer param value \multibib child
diff --git a/lib/lyx2lyx/lyx_2_3.py b/lib/lyx2lyx/lyx_2_3.py
index 9fbe12d..d4e482a 100644
--- a/lib/lyx2lyx/lyx_2_3.py
+++ b/lib/lyx2lyx/lyx_2_3.py
@@ -1840,6 +1840,93 @@ def revert_chapterbib(document):
 
     # 7. Chapterbib proper
     add_to_preamble(document, ["\\usepackage{chapterbib}"])
+
+
+def convert_dashligatures(document):
+    " Remove a zero-length space (U+200B) after en- and em-dashes. "
+
+    i = 0
+    while i < len(document.body):
+        words = document.body[i].split()
+        # Skip some document parts where dashes are not converted
+        if len(words) > 1 and words[0] == "\\begin_inset" and \
+           words[1] in ["CommandInset", "ERT", "External", "Formula", \
+                        "FormulaMacro", "Graphics", "IPA", "listings"]:
+            j = find_end_of_inset(document.body, i)
+            if j == -1:
+                document.warning("Malformed LyX document: Can't find end of " \
+                                 + words[1] + " inset at line " + str(i))
+                i += 1
+            else:
+                i = j
+            continue
+        if len(words) > 0 and words[0] in ["\\leftindent", \
+                "\\paragraph_spacing", "\\align", "\\labelwidthstring"]:
+            i += 1
+            continue
+
+        start = 0
+        while True:
+            j = document.body[i].find(u"\u2013", start) # en-dash
+            k = document.body[i].find(u"\u2014", start) # em-dash
+            if j == -1 and k == -1:
+                break
+            if j == -1 or (k != -1 and k < j):
+                j = k
+            after = document.body[i][j+1:]
+            if after.find(u"\u200B") == 0:
+                document.body[i] = document.body[i][:j+1] + after[1:]
+            else:
+                if len(after) == 0 and document.body[i+1].find(u"\u200B") == 0:
+                    document.body[i+1] = document.body[i+1][1:]
+                    break
+            start = j+1
+        i += 1
+
+
+def revert_dashligatures(document):
+    " Remove font ligature settings for en- and em-dashes. "
+    i = find_token(document.header, "\\use_dash_ligatures", 0)
+    if i == -1:
+        return
+    use_dash_ligatures = get_bool_value(document.header, 
"\\use_dash_ligatures" , i)
+    del document.header[i]
+    if not use_dash_ligatures:
+        return
+
+    # Add a zero-length space (U+200B) after en- and em-dashes
+    i = 0
+    while i < len(document.body):
+        words = document.body[i].split()
+        # Skip some document parts where dashes are not converted
+        if len(words) > 1 and words[0] == "\\begin_inset" and \
+           words[1] in ["CommandInset", "ERT", "External", "Formula", \
+                        "FormulaMacro", "Graphics", "IPA", "listings"]:
+            j = find_end_of_inset(document.body, i)
+            if j == -1:
+                document.warning("Malformed LyX document: Can't find end of " \
+                                 + words[1] + " inset at line " + str(i))
+                i += 1
+            else:
+                i = j
+            continue
+        if len(words) > 0 and words[0] in ["\\leftindent", \
+                "\\paragraph_spacing", "\\align", "\\labelwidthstring"]:
+            i += 1
+            continue
+
+        start = 0
+        while True:
+            j = document.body[i].find(u"\u2013", start) # en-dash
+            k = document.body[i].find(u"\u2014", start) # em-dash
+            if j == -1 and k == -1:
+                break
+            if j == -1 or (k != -1 and k < j):
+                j = k
+            after = document.body[i][j+1:]
+            document.body[i] = document.body[i][:j+1] + u"\u200B" + after
+            start = j+1
+        i += 1
     
 
 ##
@@ -1873,10 +1960,12 @@ convert = [
            [531, []],
            [532, [convert_literalparam]],
            [533, []],
-           [534, []]
+           [534, []],
+           [535, [convert_dashligatures]]
           ]
 
 revert =  [
+           [534, [revert_dashligatures]],
            [533, [revert_chapterbib]],
            [532, [revert_multibib]],
            [531, [revert_literalparam]],
diff --git a/src/BufferParams.cpp b/src/BufferParams.cpp
index f20078b..03b1bd5 100644
--- a/src/BufferParams.cpp
+++ b/src/BufferParams.cpp
@@ -415,6 +415,7 @@ BufferParams::BufferParams()
        fonts_default_family = "default";
        useNonTeXFonts = false;
        use_microtype = false;
+       use_dash_ligatures = true;
        fonts_expert_sc = false;
        fonts_old_figures = false;
        fonts_sans_scale[0] = 100;
@@ -812,6 +813,8 @@ string BufferParams::readToken(Lexer & lex, string const & 
token,
                lex >> fonts_cjk;
        } else if (token == "\\use_microtype") {
                lex >> use_microtype;
+       } else if (token == "\\use_dash_ligatures") {
+               lex >> use_dash_ligatures;
        } else if (token == "\\paragraph_separation") {
                string parsep;
                lex >> parsep;
@@ -1196,6 +1199,7 @@ void BufferParams::writeFile(ostream & os, Buffer const * 
buf) const
                os << "\\font_cjk " << fonts_cjk << '\n';
        }
        os << "\\use_microtype " << convert<string>(use_microtype) << '\n';
+       os << "\\use_dash_ligatures " << convert<string>(use_dash_ligatures) << 
'\n';
        os << "\\graphics " << graphics_driver << '\n';
        os << "\\default_output_format " << default_output_format << '\n';
        os << "\\output_sync " << output_sync << '\n';
diff --git a/src/BufferParams.h b/src/BufferParams.h
index 200b6d4..30a157e 100644
--- a/src/BufferParams.h
+++ b/src/BufferParams.h
@@ -280,6 +280,8 @@ public:
        std::string fonts_cjk;
        /// use LaTeX microtype package
        bool use_microtype;
+       /// use font ligatures for en- and em-dashes
+       bool use_dash_ligatures;
        ///
        Spacing & spacing();
        Spacing const & spacing() const;
diff --git a/src/Paragraph.cpp b/src/Paragraph.cpp
index eb5b111..24a3c7c 100644
--- a/src/Paragraph.cpp
+++ b/src/Paragraph.cpp
@@ -1274,6 +1274,21 @@ void Paragraph::Private::latexSpecialChar(otexstream & 
os,
                // written. (Asger)
                break;
 
+       case 0x2013:
+       case 0x2014:
+               if (bparams.use_dash_ligatures) {
+                       if (c == 0x2013) {
+                               // en-dash
+                               os << "--";
+                               column +=2;
+                       } else {
+                               // em-dash
+                               os << "---";
+                               column +=3;
+                       }
+                       break;
+               }
+               // fall through
        default:
                if (c == '\0')
                        return;
diff --git a/src/frontends/qt4/GuiDocument.cpp 
b/src/frontends/qt4/GuiDocument.cpp
index 2293ce3..01a5edd 100644
--- a/src/frontends/qt4/GuiDocument.cpp
+++ b/src/frontends/qt4/GuiDocument.cpp
@@ -838,6 +838,8 @@ GuiDocument::GuiDocument(GuiView & lv)
                this, SLOT(change_adaptor()));
        connect(fontModule->microtypeCB, SIGNAL(clicked()),
                this, SLOT(change_adaptor()));
+       connect(fontModule->dashesCB, SIGNAL(clicked()),
+               this, SLOT(change_adaptor()));
        connect(fontModule->scaleSansSB, SIGNAL(valueChanged(int)),
                this, SLOT(change_adaptor()));
        connect(fontModule->scaleTypewriterSB, SIGNAL(valueChanged(int)),
@@ -3046,6 +3048,7 @@ void GuiDocument::applyView()
                fromqstr(fontModule->cjkFontLE->text());
 
        bp_.use_microtype = fontModule->microtypeCB->isChecked();
+       bp_.use_dash_ligatures = !fontModule->dashesCB->isChecked();
 
        bp_.fonts_sans_scale[nontexfonts] = fontModule->scaleSansSB->value();
        bp_.fonts_sans_scale[!nontexfonts] = fontModule->font_sf_scale;
@@ -3550,6 +3553,7 @@ void GuiDocument::paramsToDialog()
                fontModule->cjkFontLE->setText(QString());
        
        fontModule->microtypeCB->setChecked(bp_.use_microtype);
+       fontModule->dashesCB->setChecked(!bp_.use_dash_ligatures);
 
        fontModule->fontScCB->setChecked(bp_.fonts_expert_sc);
        fontModule->fontOsfCB->setChecked(bp_.fonts_old_figures);
diff --git a/src/frontends/qt4/ui/FontUi.ui b/src/frontends/qt4/ui/FontUi.ui
index 416c3fb..e04d5d8 100644
--- a/src/frontends/qt4/ui/FontUi.ui
+++ b/src/frontends/qt4/ui/FontUi.ui
@@ -242,7 +242,27 @@
      </property>
     </widget>
    </item>
+   <item row="10" column="1">
+    <widget class="QCheckBox" name="microtypeCB">
+     <property name="toolTip">
+      <string>Activate extensions such as character protrusion and font 
expansion via the microtype package</string>
+     </property>
+     <property name="text">
+      <string>Enable micr&amp;o-typographic extensions</string>
+     </property>
+    </widget>
+   </item>
    <item row="11" column="1">
+    <widget class="QCheckBox" name="dashesCB">
+     <property name="toolTip">
+      <string>Use \\textendash and \\textemdash instead of -- and --- for en- 
and em-dashes</string>
+     </property>
+     <property name="text">
+      <string>Don't use ligatures for en- and &amp;em-dashes</string>
+     </property>
+    </widget>
+   </item>
+   <item row="12" column="1">
     <spacer name="verticalSpacer">
      <property name="orientation">
       <enum>Qt::Vertical</enum>
@@ -255,16 +275,6 @@
      </property>
     </spacer>
    </item>
-   <item row="10" column="1">
-    <widget class="QCheckBox" name="microtypeCB">
-     <property name="toolTip">
-      <string>Activate extensions such as character protrusion and font 
expansion via the microtype package</string>
-     </property>
-     <property name="text">
-      <string>Enable micr&amp;o-typographic extensions</string>
-     </property>
-    </widget>
-   </item>
   </layout>
  </widget>
  <layoutdefault spacing="6" margin="11"/>
diff --git a/src/version.h b/src/version.h
index 43a9b35..5a03e0d 100644
--- a/src/version.h
+++ b/src/version.h
@@ -32,8 +32,8 @@ extern char const * const lyx_version_info;
 
 // Do not remove the comment below, so we get merge conflict in
 // independent branches. Instead add your own.
-#define LYX_FORMAT_LYX 534 // spitz: chapterbib support
-#define LYX_FORMAT_TEX2LYX 534
+#define LYX_FORMAT_LYX 535 // ef: support for en/em-dash as ligatures
+#define LYX_FORMAT_TEX2LYX 535
 
 #if LYX_FORMAT_TEX2LYX != LYX_FORMAT_LYX
 #ifndef _MSC_VER

Reply via email to