Dear Scott,

On 2017-10-25, Scott Kostyshak wrote:
> On Mon, Oct 23, 2017 at 08:44:07AM +0000, Enrico Forestieri wrote:
>> On Sun, Oct 22, 2017 at 09:36:56PM -0400, Richard Heck wrote:
>> > On 10/22/2017 06:19 PM, Scott Kostyshak wrote:
>> > > On Sat, Oct 14, 2017 at 06:37:06AM +0000, Jürgen Spitzmüller wrote:
>> > >
>> > >> Also, I think we should consider Günter's lyx2lyx patch [1], but I
>> > >> didn't have time to thoroughly review it myself.
>> > > Will anyone have time to take a look at the patch by Wednesday?
>> > 
>> > I could look at the code from a code-triaging point of view, but I have
>> > not followed
>> > the controversy about this, so someone would need to tell me what the
>> > code is supposed
>> > to do.

>> I don't think this is a change that should be performed at a RC1 stage,
>> frankly. It is too risky and of dubious utility.

> Thanks for bringing up this concern. The patch is not trivial. Further,
> the patch only deals with backwards compatibility. From what I recall,
> we place a higher importance on forward conversion, and although we do
> make efforts to provide good backward compatibility, I believe that we
> have at times knowingly not implemented certain functionality in our
> backwards conversion. 

Actually, the patch deals with both, forward conversion and backward
conversion. In tend to summarize both conversions under the tag "backwards
compatibility".

> Indeed, our "Development" manual covers this:

>   While the conversion routine is required to produce a document that
>   is equivalent to the old version, the requirements of the reversion
>   are not that strict. If possible, try to produce a proper reversion,
>   using ERT if needed, but for some features this might be too
>   complicated. In this case, the minimum requirement of the reversion
>   routine is that it produces a valid document which can be read by an
>   older LyX. If absolutely needed, even data loss is allowed for the
>   reversion.

> The current code (without the patch) clearly already satisfies the
> "minimum requirement".

I don't agree:

The current conversion routine does *not* produce a document that is
equivalent to the old version:

  * If you used literal em- and en-dashes in pre-2.2 documents,
    you must manually unselect
    "Document->Settings->Fonts->Output em- and en-dash as ligatures"
    to ensure unchanged behaviour.


OTOH, the backwards conversion hack for 2.2 adding ZWSP

* needs to be reverted on forward conversion,
* prevents proper back-conversion to 2.1 and older,
* makes documents uncompilable with LyX 2.0 or earlier

I replaced the ZWSP with preamble code, but we could also just remove it
(as ERT was voted against and "the requirements of the reversion are not
that strict"). 
This would make the patch simpler and less error prone. 

> All this to say: I don't think the patch is too
> important for 2.3.0 and in my opinion I'm fine if we do not put it in.

Literal dashes may be less common than "ligature dashes" in pre-2.2
documents, however 
* the UserGuide documents Insert>Symbols als method to insert dashes since
  several LyX versions,
* there is evidence in mailing list threads that users were inserting
  literal dashes via system keyboard shortcuts years ago.

The problem here is, that we cannot know for sure how large the number of
affected users is: the changes in 2.2 did not affect them and they will only
speak up once the new damage is done.


I can propose a medium and a minimal patch:

The minimal patch just removes the ZWSP hack,
the medium patch also sets the new setting based on content.

> Since I don't think the patch is critical, and since we are so close to
> RC1, I propose that we should only consider this patch if a developer
> gives a confident +1, and if we have an extensive test suite. Although
> an extensive test suite cannot catch all potential issues, it can
> certainly help us be more confident for a patch like this.

> I cannot confidently review the patch, but if another developer
> carefully reviews the patch and gives a confident +1, and if Günter has
> time to help, I can create a test suite of many different scenarios.
> This would consist of different .lyx files created by current LyX 2.3.x,
> and we would check the export to 2.2.x with and without the patch. We
> could also check the loading of the resulting files in e.g. LyX 2.2.x,
> and the LaTeX output from 2.2.x.

> There are a lot of "if's" above and we have a short amount of time, so
> I'm not sure it's worth it at this point. But I will leave it to others
> to decide. If another developer thinks it is worth it enough to spend
> time carefully reviewing the patch, then I volunteer to spend time
> working on the test suite.

I have a bunch of test documents that I can review and provide.

Thanks,
Günter



diff --git a/lib/lyx2lyx/lyx_2_3.py b/lib/lyx2lyx/lyx_2_3.py
index edc5b1ffa9..247934912a 100644
--- a/lib/lyx2lyx/lyx_2_3.py
+++ b/lib/lyx2lyx/lyx_2_3.py
@@ -1841,55 +1841,50 @@ def revert_chapterbib(document):
 
 
 def convert_dashligatures(document):
-    " Remove a zero-length space (U+200B) after en- and em-dashes. "
-
-    i = find_token(document.header, "\\use_microtype", 0)
-    if i != -1:
-        if document.initial_format > 474 and document.initial_format < 509:
-            # This was created by LyX 2.2
-            document.header[i+1:i+1] = ["\\use_dash_ligatures false"]
-        else:
-            # This was created by LyX 2.1 or earlier
-            document.header[i+1:i+1] = ["\\use_dash_ligatures true"]
-
-    i = 0
-    while i < len(document.body):
-        words = document.body[i].split()
-        # Skip some document parts where dashes are not converted
-        if len(words) > 1 and words[0] == "\\begin_inset" and \
-           words[1] in ["CommandInset", "ERT", "External", "Formula", \
-                        "FormulaMacro", "Graphics", "IPA", "listings"]:
-            j = find_end_of_inset(document.body, i)
-            if j == -1:
-                document.warning("Malformed LyX document: Can't find end of " \
-                                 + words[1] + " inset at line " + str(i))
-                i += 1
-            else:
-                i = j
+    "Set 'use_dash_ligatures' according to content."
+    use_dash_ligatures = None
+    # Look for dashes:
+    # (Documents in LyX-2.1 format have "\twohyphens\n" or "\threehyphens\n"
+    # as interim representation for dash ligatures in 2.2.)
+    has_literal_dashes = False
+    has_ligature_dashes = False
+    j = 0
+    for i, line in enumerate(document.body):
+        # Skip document parts where dashes are not converted
+        if (i < j) or line.startswith("\\labelwidthstring"):
             continue
-        if len(words) > 0 and words[0] in ["\\leftindent", \
-                "\\paragraph_spacing", "\\align", "\\labelwidthstring"]:
-            i += 1
+        words = line.split()
+        if (len(words) > 1 and words[0] == "\\begin_inset" and
+            words[1] in ["CommandInset", "ERT", "External", "Formula",
+                         "FormulaMacro", "Graphics", "IPA", "listings"]):
+            if -1 == find_end_of_inset(document.body, i):
+                document.warning("Malformed LyX document: "
+                    "Can't find end of %s inset at line %d" % (words[1],i))
             continue
-
-        start = 0
-        while True:
-            j = document.body[i].find(u"\u2013", start) # en-dash
-            k = document.body[i].find(u"\u2014", start) # em-dash
-            if j == -1 and k == -1:
-                break
-            if j == -1 or (k != -1 and k < j):
-                j = k
-            after = document.body[i][j+1:]
-            if after.startswith(u"\u200B"):
-                document.body[i] = document.body[i][:j+1] + after[1:]
-            else:
-                if len(after) == 0 and 
document.body[i+1].startswith(u"\u200B"):
-                    document.body[i+1] = document.body[i+1][1:]
-                    break
-            start = j+1
-        i += 1
-
+        # literal dash followed by a word or no-break space:
+        if re.search(u"[\u2013\u2014]([\w\u00A0]|$)", line,
+                     flags=re.UNICODE):
+            has_literal_dashes = True
+        # ligature dash followed by word or no-break space on next line:
+        if re.search(ur"(\\twohyphens|\\threehyphens)", line,
+                        flags=re.UNICODE) and re.match(u"[\w\u00A0]",
+                        document.body[i+1], flags=re.UNICODE):
+            has_ligature_dashes = True
+    if has_literal_dashes and has_ligature_dashes:
+        # TODO: use ERT or insert a warning note in the document?
+        document.warning('This document contained both literal and '
+            '"ligature" dashes.\n Line breaks may have changed. '
+            'See UserGuide chapter 3.9.1 for details.')
+    elif has_literal_dashes:
+        use_dash_ligatures = False
+    elif has_ligature_dashes:
+        use_dash_ligatures = True
+    # insert the setting if there is a preferred value
+    if use_dash_ligatures is not None:
+        i = find_token(document.header, "\\use_microtype", 0)
+        if i != -1:
+            document.header.insert(i+1, "\\use_dash_ligatures %s"
+                                % str(use_dash_ligatures).lower())
 
 def revert_dashligatures(document):
     " Remove font ligature settings for en- and em-dashes. "
@@ -1902,42 +1897,34 @@ def revert_dashligatures(document):
     i = find_token(document.header, "\\use_non_tex_fonts", 0)
     if i != -1:
         use_non_tex_fonts = get_bool_value(document.header, 
"\\use_non_tex_fonts", i)
-    if not use_dash_ligatures or use_non_tex_fonts:
+    if (not use_dash_ligatures
+        or use_non_tex_fonts
+        or document.backend != "latex"):
         return
 
-    # Add a zero-length space (U+200B) after en- and em-dashes
-    i = 0
-    while i < len(document.body):
-        words = document.body[i].split()
+    # Use \twodashes or \threedashes for dashes if the setting is "true"
+    j = 0
+    new_body = []
+    for i, line in enumerate(document.body):
         # Skip some document parts where dashes are not converted
+        if (i < j) or line.startswith("\\labelwidthstring"):
+            new_body.append(line)
+            continue
+        words = line.split()
         if len(words) > 1 and words[0] == "\\begin_inset" and \
-           words[1] in ["CommandInset", "ERT", "External", "Formula", \
+           words[1] in ["CommandInset", "ERT", "External", "Formula",
                         "FormulaMacro", "Graphics", "IPA", "listings"]:
             j = find_end_of_inset(document.body, i)
             if j == -1:
-                document.warning("Malformed LyX document: Can't find end of " \
+                document.warning("Malformed LyX document: Can't find end of "
                                  + words[1] + " inset at line " + str(i))
-                i += 1
-            else:
-                i = j
-            continue
-        if len(words) > 0 and words[0] in ["\\leftindent", \
-                "\\paragraph_spacing", "\\align", "\\labelwidthstring"]:
-            i += 1
+            new_body.append(line)
             continue
-
-        start = 0
-        while True:
-            j = document.body[i].find(u"\u2013", start) # en-dash
-            k = document.body[i].find(u"\u2014", start) # em-dash
-            if j == -1 and k == -1:
-                break
-            if j == -1 or (k != -1 and k < j):
-                j = k
-            after = document.body[i][j+1:]
-            document.body[i] = document.body[i][:j+1] + u"\u200B" + after
-            start = j+1
-        i += 1
+        line = line.replace(u'\u2013', '\\twohyphens\n')
+        line = line.replace(u'\u2014', '\\threehyphens\n')
+        lines = line.split('\n')
+        new_body.extend(line.split('\n'))
+    document.body = new_body
 
 
 def revert_noto(document):

Reply via email to