Ihor Radchenko <[email protected]> 2025/12/26 18:07 0800 writes: > > The Po category (Punctuation, other) is a vast collection that goes far > > beyond the daily characters used in Chinese or English. It includes many > > symbols from specialized scripts or historical contexts where the > > spacing convention is effectively "undefined" for a general-purpose > > markup parser. > > > > I believe trying to define a universal spacing rule for every character > > in the Po table might be over-engineering. Maybe the primary goal should > > be to ensure that common CJK delimiters (like 。, ,, !) are treated as > > valid boundaries for emphasis. > > Common CJK delimiters are actually covered by (category ?|). > e.g. M-: (category-set-mnemonics (char-category-set ?。)) RET > ?| should also cover all other languages that do not use spaces (if it > does not - it is a bug in Emacs)
I see. That's no doubt a clever approach. I believe it is now working good. > >> Maybe "Terminal Punctuation" property. > > > > Terminal Punctuation is indeed more promising. If we use it as a > > baseline and then cherry-pick a specific subset—or exclude a few > > problematic ones—to act as valid boundaries, the workload should be > > quite manageable. > > See the attached updated patch. I modified the left boundary regexp to > exclude Po characters with Terminal_Punctuation Unicode property > (see https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt) > CJK should still be fine, I think. > > That said, I tried > 冰淇凌*。 (Hello *world* foo. > And with the new patch > "*。 (Hello *world*" is bold. > (perfectly reasonable given the rules, but looking strange in my eyes) I give it a test, and the results are as expected, although it does look a little strange. Everything looks good for me at least now, thanks :)
