On 29/07/2022 08:43, Ihor Radchenko wrote:
Max Nikulin writes:
The good point in your patch is that \- is still work as shy hyphen
(that, by the way, may be used in some cases instead of zero width
space: *intra*\-word). On the other hand I have managed to find a case
when your approach is not ideal:
*\--scratch\--*
<p>
<b>­-scratch</b></p>
Well. I think that it is impossible to use the same escape construct to
both force emphasis and escape it.
Let's articulate the problem as follows: when some characters ("*". "/".
etc.) besides used literally are overloaded with 2 additional roles that
are start emphasis group and terminate emphasis group, in addition to
lightweight markup heuristics, it is necessary to provide a way to
disambiguate which of 3 roles is associated with particular character.
"Activate" and "deactivate" characters or entities for emphasis markers
are alternative and perhaps not so clear terms have used before.
The advantage of zero width space is that "[:space:]" is part of
PREMATCH and POSTMATCH (outer) regexps in
`org-emphasis-regexp-components' and "[:space:]" is forbidden at the
inner borders of emphasized span of text. The latter is mostly
meaningful, however I am unsure if bold space has the same width as
regular one, and space in fixed width font is certainly distinct.
The problem with the "\--" entity is that it is not handled properly at
the start of emphasis region. It neither disables emphasis nor parsed as
complete entity, instead it becomes combination of "\-" shy hyphen and
literal "-".
Unsure if it can be solved consistently. Possible ways:
- It addition to space-like (in respect to current regexp) entity add
another one that acts as a part of word, but like "\--" stripped from
output. Likely it should be accompanied by more changes in the parser
and regexps.
- Provide some new explicit syntax for literal character, start of
emphasis group, end of emphasis group.
Concerning zero width space workaround, I may be wrong, but Nicolas
might consider using U+200B zero width space as the escape character for
itself: single one is filtered out during export, double zero width
space becomes single character. (I do not like this kind of "white
space" programming language".) Another question is whether U+2060 word
joiner (or some other character) should be added either as alternative
to zero width space or to allow = verbatim = fixed width text
surrounded by fixed width spaces.