Dear Enrico, dear Lyx developers,

the patch for em- and en-dashes [72a488d7] tries to restore (as far as
possible/sensible) the pre-2.2 behaviour regarding dashes.
It also keeps the behaviour of of 2.2 documents :-).

Nevertheless, there are some shortcomings and problems:

1. Pre-2.2 documents using literal dashes were not affected by the change in
   2.2 but now they require user-interaction (unchecking
   use-ligature-dashes) to work as before.

2. Back-conversion destroys information whether dashes shall be exported as
   literal characters or ligatures.

3. Different behaviour for documents with non-TeX fonts when compiled with
   LuaTeX.


Fixes
=====

lyx2lyx converts dash ligatures to the interim representations \twohyphens
and \threehyphens and back:

        2.1  <->  2.2
        --   <->  \twohyphens
        ---  <->  \threehyphens

literal EM DASH and EN DASH characters are kept as-is

-> the information about the dash-type is only lost if a document is *opened*
in 2.2. :-)

This allows solving problems 1 and 2 with the following steps:

a) back-convert "ligature dashes" to \twohyphens rsp. \threehyphens

   +1 solves problem 2
      (keeps info also in previous versions (unless opened in 2.2))
   +1 works also in LyX versions not supporting ZWSP.

b) move the ligature-dash -> literal-dash+ZWSP conversion from
   lyx2lyx/lyx_2_3.py to Text.cpp

   +1 backport to 2.2 fixes unwanted overfull lines in 2.2

c) When converting older documents to 2.3 with lyx2lyx, set
   \use-ligature-dashes TRUE if the document contains \threehyphens or
   \twohyphens instead of depending on the document's fileformat version.

   +1 respect pre-2.2 documents with literal dashes (solves problem 1)

Old documents that contain a *mix* of literal and ligature dashes will
still show changed behaviour (regardless of the value of
"use_ligature_dashes"). This may be tolerable (assuming that only a
small fraction of existing LyX documents mix the dash representations and
only a small fraction of them will experience changed line breaks).


Alternatives
============

Support for parallel use of ligature and literal dashes can be realized with
a "special character" for ligature dashes instead of the buffer setting.

-1 em- and en-dashes are common printable characters (except for the line
   break details). Keeping two alternative representations may be overkill.
   (OTOH, we have a similar case with quotes.)



Different line breaking behaviour with LuaTeX (non-TeX fonts) compared to
the other exports (problem 3) can be solved with the alternatives

a) use literal dashes exclusively, set \XeTeXdashbreakstate=0

   +1 simple
   -1 no line break after dashes (can be fixed with "allowbreak" (see below))

b) export as ligature also with non-TeX fonts, except for teletype.

c) Preamble code making the literal dashes active and bind to ligatures for
   LuaTeX


Last but not least:

The expansion of the literal dashes in lib/unicodesymbols changed to macros
instead of ligatures 10 years ago in
https://www.lyx.org/trac/changeset/18802/lyxsvn with the explanation:

    "unicodesymbols: use commands for the dashes for consistency reasons and
    to avoid potential problems with some LaTeX-packages"


Before going back to ligatures, we should explore the reasons/side-effects:

@Uwe: Can you give an example of "potential problems with some LaTeX-packages"?

"Consistency" is clear: "inputenc" uses \textendash and \textemdash for
encodings supporting the literal dashes.

The LaTeX team introduced \text* commands replacing the font ligatures
already 23 years ago and gave the reasons in "LaTeX2ε for authors"
(usrguide.pdf):

    \textemdash \textendash \textexclamdown \textquestiondown \textquotedblleft
    \textquotedblright \textquoteleft \textquoteright

    New feature 1994/12/01

    These commands produce characters which would otherwise be accessed via
    ligatures
    ...
    The reason for making these characters directly accessible is so that they
    will work in encodings which do not have these characters.


My preference:

The "allowbreak" special character (ticket #10585) allows an easily
configurable line break option after the em-dash. The combination literal
em-dash + allowbreak is a good default for most use cases, c.f.
https://www.lyx.org/trac/raw-attachment/ticket/10543/emdash-line-breaks.pdf
and https://www.lyx.org/trac/raw-attachment/ticket/10543/dash-problems.lyx

   +1 correct feedback
   +1 optional line break after the dash
   +1 allows hyphenation of preceding/following word
   +1 configurable
   +1 simple local override by deleting ZWSP

Suggestion: add an "allowbreak" when converting en-dash + hyphen to em-dash
on input.

For the en-dash, I suggest to leave it as-is so that range-specifications
(pages 2--12, years 1987--1990, ...) don't wrap.

Problem #10490 (sorting indexes) can also be solved without reverting to
ligature input.

Günter

Reply via email to