Dear LyX developers, the attached patch cleans up lyx2lyx issues after the recent fixes to dash handling in LyX. Commiting requires the +1 from at least one other developer.
* Backwards compatibility for both, documents containing literal dashes and documents containing ligature dashes. Currently, "\use_dash_ligatures" is set based on the original file version. If you used literal em- and en-dashes in pre-2.2 documents, you must manually unselect "Output em- and en-dash as ligatures" to ensure unchanged behaviour. The patch ensures content is scanned for literal and ligature dashes and the setting set to ensure unchanged line breaks. Pre-LyX 2.2 documents with both, literal AND ligature dashes trigger a warning and uses the default value for "\use_dash_ligatures". We could also consider ERT in these (rare) cases. * Round-trip 2.3 -> <older format> -> 2.3 keeps "\use_dash_ligatures" value. Currently, , the original value of the setting is lost: 2.3 -> 2.2 -> 2.3 forces "\use_dash_ligatures false" and 2.3 -> 2.1 (and older) -> 2.3 forces "\use_dash_ligatures true". * Backwards compatibility for 2.2 via preamble code. Currently, the 2.2 workaround uses zero width space (ZWSP) characters. Re-defining \textemdash and \textendash ensures unchanged output also regarding hyphenation of words adjacent to the dashes. The preamble code is removed when converting from 2.2 (both directions). * Conversion 2.3 -> 2.1 (or older) produces ligature dashes if "\use_dash_ligatures true". Currently, the 2.2 workaround with literal dash + ZWSP is also used for export to 2.1 and older with suboptimal results and problems with the ZWSP character in 2.0 and earlier. The patch allows removal of all dash-related caveats in the 2.3 RELEASE NOTES. Please try it out and give a +1 or improvement suggestions. Günter ----- End forwarded message -----
>From 0691a3537cbadc0336edd9f47b14e8047a39cad2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=BCnter=20Milde?= <mi...@lyx.org> Date: Sat, 30 Sep 2017 23:26:02 +0200 Subject: [PATCH] Fix lyx2lyx conversion of dashes. --- lib/RELEASE-NOTES | 32 ++--------- lib/lyx2lyx/lyx_2_2.py | 6 ++ lib/lyx2lyx/lyx_2_3.py | 153 ++++++++++++++++++++++++------------------------- 3 files changed, 87 insertions(+), 104 deletions(-) diff --git a/lib/RELEASE-NOTES b/lib/RELEASE-NOTES index 440b93e71a..007ae575ee 100644 --- a/lib/RELEASE-NOTES +++ b/lib/RELEASE-NOTES @@ -13,13 +13,12 @@ be safely dissolved, as it will be automatically inserted at export time if needed, as usual. -* The new setting - "Document->Settings->Fonts->Output em- and en-dash as ligatures" forces - output of en- and em-dashes as -- and --- when exporting to LaTeX. - It is is "true" by default but "false" when opening documents edited - with LyX 2.2. - See chapter 3.9.1.1 "Dashes and line breaks" of the User Guide and - "Caveats when upgrading from earlier versions to 2.3.x" below. +* The new setting "Output em- and en-dash as ligatures" under + "Document->Settings->Fonts" forces output of en and em dashes as -- and + --- when exporting to LaTeX. The default is "true". When opening old + documents, the setting is "false" if literal dashes were used and + different line breaks might occur. See chapter 3.9.1.1 "Dashes and line + breaks" of the User Guide for details. * The following UI translations were dropped, because the lack of translation maintenance: Russian, Danish, Greek, Serbian, Galician, Catalan, Romanian, @@ -209,25 +208,6 @@ the external_templates file, you will have to move the modifications to the respective *.xtemplate file manually. -* If you used literal em- and en-dashes in pre-2.2 documents, - you must manually unselect - "Document->Settings->Fonts->Output em- and en-dash as ligatures" - to ensure unchanged behaviour. - -* ZWSP characters (u200b) following literal em- and en-dashes are deleted by - lyx2lyx when converting to 2.3 format. If you used them as optional line - breaks after dashes, convert them to space insets before opening your - document with LyX 2.3 or the optional line breaks will be lost! - -* If using TeX fonts and en- and em-dashes are output as font ligatures, - when exporting documents containing en- and em-dashes to the format of - LyX 2.0 or earlier, the following line has to be manually added to the - unicodesymbols file of that LyX version:<br> - 0x200b "\\hspace{0pt}" "" "" "" "" # ZERO WIDTH SPACE<br> - This avoids "uncodable character" issues if the document is actually - loaded by that LyX version. LyX 2.1 and later versions already have the - necessary definition in their unicodesymbols file. - * If trying to compile documents using R scripts and sweave/knitr, LyX 2.3.x would not allow for re-running the R scripts, unless the user: 1) explicitly disables the "Forbid use of needauth converters" diff --git a/lib/lyx2lyx/lyx_2_2.py b/lib/lyx2lyx/lyx_2_2.py index 996c22684e..2f4ef3ac2a 100644 --- a/lib/lyx2lyx/lyx_2_2.py +++ b/lib/lyx2lyx/lyx_2_2.py @@ -659,6 +659,12 @@ def convert_dashes(document): def revert_dashes(document): "convert \\twohyphens and \\threehyphens to -- and ---" + # eventually remove preamble code from 2.3->2.2 conversion: + for i, line in enumerate(document.preamble): + if i > 1 and line == r'\renewcommand{\textemdash}{---}': + if (document.preamble[i-1] == r'\renewcommand{\textendash}{--}' + and document.preamble[i-2] == '% Added by lyx2lyx'): + del document.preamble[i-2:i+1] i = 0 while i < len(document.body): words = document.body[i].split() diff --git a/lib/lyx2lyx/lyx_2_3.py b/lib/lyx2lyx/lyx_2_3.py index edc5b1ffa9..735a34f54a 100644 --- a/lib/lyx2lyx/lyx_2_3.py +++ b/lib/lyx2lyx/lyx_2_3.py @@ -1841,58 +1841,63 @@ def revert_chapterbib(document): def convert_dashligatures(document): - " Remove a zero-length space (U+200B) after en- and em-dashes. " - - i = find_token(document.header, "\\use_microtype", 0) - if i != -1: - if document.initial_format > 474 and document.initial_format < 509: - # This was created by LyX 2.2 - document.header[i+1:i+1] = ["\\use_dash_ligatures false"] - else: - # This was created by LyX 2.1 or earlier - document.header[i+1:i+1] = ["\\use_dash_ligatures true"] - - i = 0 - while i < len(document.body): - words = document.body[i].split() - # Skip some document parts where dashes are not converted - if len(words) > 1 and words[0] == "\\begin_inset" and \ - words[1] in ["CommandInset", "ERT", "External", "Formula", \ - "FormulaMacro", "Graphics", "IPA", "listings"]: - j = find_end_of_inset(document.body, i) - if j == -1: - document.warning("Malformed LyX document: Can't find end of " \ - + words[1] + " inset at line " + str(i)) - i += 1 - else: - i = j - continue - if len(words) > 0 and words[0] in ["\\leftindent", \ - "\\paragraph_spacing", "\\align", "\\labelwidthstring"]: - i += 1 - continue - - start = 0 - while True: - j = document.body[i].find(u"\u2013", start) # en-dash - k = document.body[i].find(u"\u2014", start) # em-dash - if j == -1 and k == -1: - break - if j == -1 or (k != -1 and k < j): - j = k - after = document.body[i][j+1:] - if after.startswith(u"\u200B"): - document.body[i] = document.body[i][:j+1] + after[1:] - else: - if len(after) == 0 and document.body[i+1].startswith(u"\u200B"): - document.body[i+1] = document.body[i+1][1:] - break - start = j+1 - i += 1 - + "Set 'use_dash_ligatures' according to content." + use_dash_ligatures = None + # eventually remove preamble code from 2.3->2.2 conversion: + for i, line in enumerate(document.preamble): + if i > 1 and line == r'\renewcommand{\textemdash}{---}': + if (document.preamble[i-1] == r'\renewcommand{\textendash}{--}' + and document.preamble[i-2] == '% Added by lyx2lyx'): + del document.preamble[i-2:i+1] + use_dash_ligatures = True + if use_dash_ligatures is None: + # Look for dashes: + # (Documents by LyX 2.1 or older have "\twohyphens\n" or "\threehyphens\n" + # as interim representation for dash ligatures in 2.2.) + has_literal_dashes = False + has_ligature_dashes = False + j = 0 + for i, line in enumerate(document.body): + # Skip some document parts where dashes are not converted + if (i < j) or line.startswith("\\labelwidthstring"): + continue + words = line.split() + if len(words) > 1 and words[0] == "\\begin_inset" and \ + words[1] in ["CommandInset", "ERT", "External", "Formula", + "FormulaMacro", "Graphics", "IPA", "listings"]: + j = find_end_of_inset(document.body, i) + if j == -1: + document.warning("Malformed LyX document: " + "Can't find end of %s inset at line %d" % (words[1],i)) + continue + # literal dash followed by a word or no-break space: + if re.search(u"[\u2013\u2014]([\w\u00A0]|$)", line, + flags=re.UNICODE): + has_literal_dashes = True + # ligature dash followed by word or no-break space on next line: + if re.search(ur"(\\twohyphens|\\threehyphens)", line, + flags=re.UNICODE) and re.match(u"[\w\u00A0]", + document.body[i+1], flags=re.UNICODE): + has_ligature_dashes = True + if has_literal_dashes and has_ligature_dashes: + # TODO: insert a warning note in the document? + document.warning('This document contained both literal and ' + '"ligature" dashes.\n Line breaks may have changed. ' + 'See UserGuide chapter 3.9.1 for details.') + elif has_literal_dashes: + use_dash_ligatures = False + elif has_ligature_dashes: + use_dash_ligatures = True + # insert the setting if there is a preferred value + if use_dash_ligatures is not None: + i = find_token(document.header, "\\use_microtype", 0) + if i != -1: + document.header.insert(i+1, "\\use_dash_ligatures %s" + % str(use_dash_ligatures).lower()) def revert_dashligatures(document): - " Remove font ligature settings for en- and em-dashes. " + """Remove font ligature settings for en- and em-dashes. + Revert conversion of \twodashes or \threedashes to literal dashes.""" i = find_token(document.header, "\\use_dash_ligatures", 0) if i == -1: return @@ -1902,42 +1907,34 @@ def revert_dashligatures(document): i = find_token(document.header, "\\use_non_tex_fonts", 0) if i != -1: use_non_tex_fonts = get_bool_value(document.header, "\\use_non_tex_fonts", i) - if not use_dash_ligatures or use_non_tex_fonts: + if not use_dash_ligatures or use_non_tex_fonts or document.backend != "latex": return - # Add a zero-length space (U+200B) after en- and em-dashes - i = 0 - while i < len(document.body): - words = document.body[i].split() + j = 0 + new_body = [] + for i, line in enumerate(document.body): # Skip some document parts where dashes are not converted + if (i < j) or line.startswith("\\labelwidthstring"): + new_body.append(line) + continue + words = line.split() if len(words) > 1 and words[0] == "\\begin_inset" and \ - words[1] in ["CommandInset", "ERT", "External", "Formula", \ + words[1] in ["CommandInset", "ERT", "External", "Formula", "FormulaMacro", "Graphics", "IPA", "listings"]: j = find_end_of_inset(document.body, i) if j == -1: - document.warning("Malformed LyX document: Can't find end of " \ + document.warning("Malformed LyX document: Can't find end of " + words[1] + " inset at line " + str(i)) - i += 1 - else: - i = j - continue - if len(words) > 0 and words[0] in ["\\leftindent", \ - "\\paragraph_spacing", "\\align", "\\labelwidthstring"]: - i += 1 + new_body.append(line) continue - - start = 0 - while True: - j = document.body[i].find(u"\u2013", start) # en-dash - k = document.body[i].find(u"\u2014", start) # em-dash - if j == -1 and k == -1: - break - if j == -1 or (k != -1 and k < j): - j = k - after = document.body[i][j+1:] - document.body[i] = document.body[i][:j+1] + u"\u200B" + after - start = j+1 - i += 1 + line = line.replace(u'\u2013', '\\twohyphens\n') + line = line.replace(u'\u2014', '\\threehyphens\n') + lines = line.split('\n') + new_body.extend(line.split('\n')) + document.body = new_body + # redefine the dash LICRs to use ligature dashes: + add_to_preamble(document, [r'\renewcommand{\textendash}{--}', + r'\renewcommand{\textemdash}{---}']) def revert_noto(document): @@ -2228,7 +2225,7 @@ def revert_mathnumberingname(document): else: l = find_token(document.header, "\\use_default_options", 0) document.header.insert(l, "\\options reqno") - # add the math_number_before tag + # add the math_number_before tag regexp = re.compile(r'(\\math_numbering_side default)') i = find_re(document.header, regexp, 0) if i != -1: -- 2.11.0