Dear LyX developers,

some LICR (LaTeX internal character representation) macros require
termination by a {}, so that the macro name can be separated from the
following text.

  Example: "\\OE", e.g. used in \OE{}vre

However, "lib/unicodesymbols" has also replacement code that must not be
terminated with {}. 

  Examples: "ff"  (ff ligature)
            "\\-" (no-break-hyphen)
            "\\mkern3mu\\mathchar'26\\mkern-12mu d" (dj character in math)

To distinguish cases that require/don't require termination, the
"notermination" flag (with values "text", "math", "both", "none") was
introduced (set by default if a command ends with "}").

I propose to replace this flag with a detection algorithm to make the
"unicodesymbols" file easier to read/edit:

TeX's macro name parsing can be described as: 

  After a backslash, take every alphabetical character (a-z,A-Z) until
  reaching a non-alphabetical character (space, digits, punctuation, ...)
  
  If the first character after the backslash is no letter, the first
  non-letter character is used as macro name. Otherwise the macro name is
  the sequence of letters up to the first non-letter.
  
We could reverse this parsing to determine whether a command in
"unicodesymbols" requires termination:

  Test the characters in the command string in reverse order:

  * Only if the last characters are a backslash followed by letters, the
    command needs termination
    
  * if there is no backslash, no termination is required (like with "ff")
  
  * a non-letter character (everthing exept a-zA-Z) abort the loop.
  
  The regular expression would be "\\[a-zA-Z]+$" (literal backslash,
  followed by one or more characters in the range a-zA-Z, anchored at the
  end of the string).  
  
A possible Python implementation would be

def requires_termination(command):
    """check whether `command` must be terminated with '{}'"""
    for c in list(command).__reversed__():
        if c == "\\":
            return True
        if not c.isalpha():
            return False
    return False

>>> requires_termination(r'\foo')
True
>>> requires_termination(r'was ist \foo')
True
>>> requires_termination(r'\foo bar')
False
>>> requires_termination(r'\foo{a}')
False
>>> requires_termination(r'\~')
False
>>> requires_termination(r'ffl')
False


Would it be difficult, to implement something in this line in C++?


Günter

Reply via email to