Ihor Radchenko <[email protected]> 2025/12/26 18:07 0800 writes:
> > The Po category (Punctuation, other) is a vast collection that goes far
> > beyond the daily characters used in Chinese or English. It includes many
> > symbols from specialized scripts or historical contexts where the
> > spacing convention is effectively "undefined" for a general-purpose
> > markup parser.
> >
> > I believe trying to define a universal spacing rule for every character
> > in the Po table might be over-engineering. Maybe the primary goal should
> > be to ensure that common CJK delimiters (like 。, ,, !) are treated as
> > valid boundaries for emphasis.
>
> Common CJK delimiters are actually covered by (category ?|).
> e.g. M-: (category-set-mnemonics (char-category-set ?。)) RET
> ?| should also cover all other languages that do not use spaces (if it
> does not - it is a bug in Emacs)

I see. That's no doubt a clever approach. I believe it is now working good.

> >> Maybe "Terminal Punctuation" property.
> >
> > Terminal Punctuation is indeed more promising. If we use it as a
> > baseline and then cherry-pick a specific subset—or exclude a few
> > problematic ones—to act as valid boundaries, the workload should be
> > quite manageable.
>
> See the attached updated patch. I modified the left boundary regexp to
> exclude Po characters with Terminal_Punctuation Unicode property
> (see https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt)
> CJK should still be fine, I think.
>
> That said, I tried
> 冰淇凌*。 (Hello *world* foo.
> And with the new patch
> "*。 (Hello *world*" is bold.
> (perfectly reasonable given the rules, but looking strange in my eyes)

I give it a test, and the results are as expected, although it does look a
little strange.

Everything looks good for me at least now, thanks :)

Reply via email to