> On 7 May 2017, at 08:36 , Oliver Keyes <ironho...@gmail.com> wrote: > > Hey all, > > I've ran into a weird quirk on Mac platforms, which you can read fully > at https://github.com/Ironholds/urltools/issues/70 > > The long and the short of it is that one specific codepoint - \u04cf - > does not print in a UTF-8-y way by default, except when run through > cat(). Compare, for example: > > encodeString("\u04cf") > > and: > > encodeString("\u044D") > > Kevin Ushey was kind enough to bring his expertise, and found that it > may be a locale-specific problem as well as a Mac-specific problem, > because 'sourcetools' shows that there's no locale information for the > character. But this only appears in R - Python has it display > perfectly - so I'm kind of at a loss. Does anyone know what's going > on?
Python being less careful than R? Basically, things get encoded if not known to be printable, and "Cyrillic Small Letter Palochka" is (it seems) not recorded as printable in the common utf-8 locales. From what I can google, it is used in Chechen and even then only as a postfix to certain characters. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.