Re: Bug: ODT export of Chinese text inserts spaces for line breaks

Maxim Nikulin Tue, 29 Jun 2021 10:02:59 -0700

On 29/06/2021 10:47, James Harkins wrote:

So, it would make sense to add a rule to the exporter: if one of the
characters before or after a source-text line break is a Chinese,
Japanese or Korean character, do not add a space.


On 29/06/2021 11:43, tumashu wrote:

You can try the below config :-)
     (let ((regexp "[[:multibyte:]]")
           (string text))
       (setq string
             (replace-regexp-in-string
              (format "\\(%s\\) *\n *\\(%s\\)" regexp regexp)
              "\\1\\2" string))

Notice that [[:multibyte:]] means almost any non-ASCII script, e.g.Cyrillic:


(let ((sample "abc абв def"))
  (and (string-match "[[:multibyte:]]\+" sample)
       (match-string 0 sample)))
"абв"

It seems, `org-fill-paragraph' M-q is smart enough to avoid a spacebefore or after a CJK character, so it is possible to determine correctway to splice lines, despite e.g. "Script" Unicode property is notexposed to elisp:https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Properties.html(Anyway maintaining explicit list of scripts is not a straightforwardapproach.)


P.S.

JavaScript in browsers allows to filter characters that belong toparticular script:


"abc абв def".match(/\p{Script=Cyrillic}+/u)
Array [ "абв" ]

I have not found such feature in regular expressions available in Emacs.

Re: Bug: ODT export of Chinese text inserts spaces for line breaks

Reply via email to