On 26/02/2015 3:09 PM, maill...@tlink.de wrote:
When I send some outlandish characters through enc2native (or format) in
R 3.1.2 on Ubuntu trusty it works quite well:
> "®ØΔЊת"
[1] "®ØΔЊת"
> enc2native("®ØΔЊת")
[1] "®ØΔЊת"
> Encoding(enc2native("®ØΔЊת"))
[1] "UTF-8"
In Windows the result is different:
> "®ØΔЊת"
[1] "®ØΔЊת"
> enc2native("®ØΔЊת")
[1] "®Ø<U+0394><U+040A><U+05EA>"
> Encoding(enc2native("®ØΔЊת"))
[1] "latin1"
And this is wrong. The native character set of a unicode application
under Windows is *Unicode*. enc2native should do the same under Windows
as it does on Ubuntu. Also the "unknown" encoding should be changed to
mean the same as "UTF-8" exactly as it is on Linux.
What is a "unicode application", and what makes you think R is one? R
is being told by Windows that your native encoding is latin1. Perhaps
Windows 8 supports UTF-8 as a native encoding (I've never used it), but
previous versions of Windows didn't.
Duncan Murdoch
A unicode application is a program that uses the unicode API of Windows
- the functions with the ending W. For such a application the system
code page (native encoding) is completely irrelevant. The system code
page is just a compatibility feature that enables Windows NT/Vista/7/8
to run applications that were developed for Windows 95 which didn't have
unicode support. But this line of operating systems is dead for 10 years
now. R obviously is a unicode application because it can print - or read
from the clipboard - characters like "ΔЊת" that are not in my system
code page which is not possible over the legacy API.
Neither the unicode API nor the legacy API accepts UTF-8. The legacy API
needs strings encoded according to the active code page and the unicode
API wants UTF-16. If you have UTF-8 you need to convert it in either to
the active code page which will loose all characters that are not
covered by it or convert to UTF-16 and use the unicode functions. But
this is not the problem, the Windows interface functions of R are
working quite nicely with unicode already.
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel