On 8/16/21 12:42 PM, Ivan Krylov wrote:
On Mon, 16 Aug 2021 09:05:54 +0000
David Norris <da...@precisionmethods.guru> wrote:

Unicode U+00d7 (times), U+00b1 (plus-minus) and U+03bc (mu) have
equivalents in Latin-1 encoding, and I have used these without
difficulty in strings, neither U+2206 (INCREMENT) nor U+0394 (Greek
Delta) does
But not in some other locale encodings on Windows (e.g. CP-1251), nor
in some single-byte locale encodings on *nix-like systems (e.g.
ru_RU.KOI8-R), which are admittedly used much rarer nowadays than on
Windows. Unless I'm mistaken, the "\u2206t" in your example needs to
become a symbol, and symbols are always translated into the locale
encoding [1] [2].

I would expect this warning to be a problem for CRAN, but I'm just
another package developer, so I could be wrong.

Yes, this is a problem. Only ASCII characters should be used in symbol names (in R packages), as they can be represented in every (supported) locale.

Some characters would be best-fitted by Windows (replaced by other similar characters) during translation to native encoding, if they are not representable directly, but that can produce surprising results and should not be relied on, definitely not in packages.

Best
Tomas

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to