> On 7 May 2017, at 08:36 , Oliver Keyes <ironho...@gmail.com> wrote:
> 
> Hey all,
> 
> I've ran into a weird quirk on Mac platforms, which you can read fully
> at https://github.com/Ironholds/urltools/issues/70
> 
> The long and the short of it is that one specific codepoint - \u04cf -
> does not print in a UTF-8-y way by default, except when run through
> cat(). Compare, for example:
> 
> encodeString("\u04cf")
> 
> and:
> 
> encodeString("\u044D")
> 
> Kevin Ushey was kind enough to bring his expertise, and found that it
> may be a locale-specific problem as well as a Mac-specific problem,
> because 'sourcetools' shows that there's no locale information for the
> character. But this only appears in R - Python has it display
> perfectly - so I'm kind of at a loss. Does anyone know what's going
> on?
Python being less careful than R? 

Basically, things get encoded if not known to be printable, and "Cyrillic Small 
Letter Palochka" is (it seems) not recorded as printable in the common utf-8 
locales. From what I can google, it is used in Chechen and even then only as a 
postfix to certain characters.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to