> On 26 Sep 2021, at 06:49, Werner LEMBERG <w...@gnu.org> wrote: > >>>> The idea here is different, it is for identifiers, and in the >>>> input syntax only, does not change the internal semantics at all. >>>> It is good not having to type backslash when a command is used. >>> >>> Really? I highly doubt that. In particular, what about lyrics >>> mode? >> >> The idea would be to change the file lexer.ll by adding U and >> UCOMMAND: >> >> A [a-zA-Z\200-\377] >> U [\200-\377] >> AA {A}|_ >> N [0-9] >> ANY_CHAR (.|\n) >> SYMBOL {A}([-_]{A}|{A})* >> COMMAND \\{SYMBOL} >> UCOMMAND {U}{SYMBOL} >> >> Then in select places, that is context switches, add {UCOMMAND}: >> {COMMAND} { >> return scan_escaped_word (YYText_utf8 () + 1); >> } >> {UCOMMAND} { >> return scan_escaped_word (YYText_utf8 ()); >> } > > You might provide a MR, maybe it gets accepted. I still doubt that it > would be a good idea.
There is a conflict in some contexts between {SYMBOL} and {COMMAND}, so may not work. To get a regular COMMAND syntax, they should start with something that SYMBOL does not. Otherwise you might replace the function YYText_utf8 with proper UTF-8 patterns, a variation of: /* UTF-8 character with valid Unicode code point. */ utf8char [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x\90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})