Ihor Radchenko <[email protected]> 2025/12/25 22:55 0800 writes,
> Yue Yi <[email protected]> writes:
>
> > I still have some minor doubts regarding the regexp part in
> > org-set-emph-re, as I mentioned in my previous email. After all,
> > standard regex character classes [...] should not contain \\| to
> > represent alternation (logical OR). But the value of `org-emph-re' is:
> >
> > "\\(^\\|[[:space:]]\\|\\c{\\|\\c[\\|\\c-\\|\\c,\\|\\c|\\)\\(\\([*/_+]\\)\\([^[:space:]]\\|[^[:space:]].*?\\(?:
> > .*?\\)\\{0,1\\}[^[:space:]]\\)\\3\\)\\([[[:space:]]\\|\\c}\\|\\c]\\|\\c-\\|\\c,\\|\\c|\\|$]\\|$\\)"
> >
> > Specifically, I mean the final part of this regexp:
> >
> > [[[:space:]]\\|\\c}\\|\\c]\\|\\c-\\|\\c,\\|\\c|\\|$]
>
> I am not sure what is the problem here. Could you please elaborate?

Sure.

With the patch applied, the value of `org-element-emphasis-post-re' is:

"[[:space:]]\\|\\c}\\|\\c]\\|\\c-\\|\\c,\\|\\c|\\|$"

As you can see, this explicitly includes the line-end anchor "$".

However, in `org-set-emph-re', the template string used in concat ends
with "\\([%s]\\|$\\)". This means the post parameter is injected into a
bracket expression [...].

Since the new value of post (when post is nil, we use
org-element-emphasis-post-re here) already uses \\| for alternation and
includes "$", we end up with "$" appearing twice: once inside the
brackets (becoming a literal) and once outside.

My question is: Is the "[%s]\\|$" structure in the template redundant?

> > Apart from that, your code works great. I look forward to using this in
> > Emacs 31 to get rid of the annoying ZWS. Though we'll still need them
> > for English text (like a*b*c), that's a separate discussion.
>
> We are far from there. I am mostly toying around whether this syntax is
> going to break Org or not.
>
> There are still problems with the proposed approach. In particular,
> using Po Unicode character category might be problematic.
> "!?.," are all Po, but we currently just allow them as right boundary.
> This makes sense since !* is probably intentional - in English, ! is
> end of sentence and should be followed by a space. So, it is unlikely to
> be expected as a left boundary of emphasis.
> 、 。 ! , . ? are also Po, but I am not sure whether one may expect
> to write 您好。*我*叫Ihor。

Yes, this is exactly what we expect.

Unlike English, CJK languages do not use spaces to separate sentences or
phrases. The punctuation marks themselves act as the delimiters.

If we were forced to add a space (e.g., 您好。 *我*...), it would create
an unnaturally wide gap in the text, which is considered a typographic
error in Chinese. Therefore, allowing punctuation as a left boundary is good.

I'd like to provide or do some tests if needed.

Reply via email to