[R] Antwort: Re: Multiple language output - Correct in RGui, wrong in .txt after sink()

mark . redshaw Thu, 20 May 2010 08:02:37 -0700

Dear Brian,
many thanks for reply it helped a great deal. But firstly an apology for 
not providing the "at a minimum",my (newbe) error, sorry for making you 
guess.
The info is/was:
> sessionInfo()
R version 2.10.0 (2009-10-26) 
i386-pc-mingw32

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252 
LC_MONETARY=German_Germany.1252 LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 

I am "stuck" using the system I have , no way for me to change that and 
still get access to the data I need. The email is from Lotus Notes and I 
also have no idea how this works with encoding but think it is responsible 
for the problems with the Korean text.
For reference the fonts I have used are MS Mincho and Arial Unicode MS 
which both seen to have all the characters I need in the languages I am 
using.

I have taken your suggestions and tested and had positive results.

With the connection that you suggested, I found that on my system that the 
solution was:
con <- file("output.txt",open = "a",encoding="UTF-8")

I also started looking the locale following you hint and found that by 
changing this together with the connection I could get the output in the 
form i wanted in all languages including Korean
I am, as you say, just thankful for the "miracle" of this working at all.
It may not be optimal, and ideas from you or others would be welcome, but 
it works!
many thanks
Mark

""
RM_EN <- c("Alfalfa hay","Alfalfa meal","Alfalfa silage")
RM_DE <- c("Luzerneheu","LurzernegrÃ¼nmehl","Luzernesilage")
RM_RU <- c("ÐÑÑÐµÑÐ½Ð¾Ð²Ð¾Ðµ ÑÐµÐ½Ð¾","ÐÑÑÐµÑÐ½Ð¾Ð²Ð°Ñ 
ÑÑÐ°Ð²ÑÐ½Ð°Ñ Ð¼ÑÐºÐ°","ÐÑÑÐµÑÐ½Ð¾Ð²ÑÐ¹ 
ÑÐµÐ½Ð°Ð¶")
RM_CN <- c("èè¿å¹²è","èè¿èç²","èè¿éè´®")
RM_JP <- c("ï½±ï¾ï¾ï½§ï¾ï¾ï½§ä¹¾è","ï½±ï¾ï¾ï½§ï¾ï¾ï½§ 
ï¾ï½°ï¾","ï½±ï¾ï¾ï½§ï¾ï¾ï½§ ï½»
ï½²ï¾ï½°ï½¼ï¾")
RM_KR <- c("ìíí ê±´ì´","ìíí ë°","ìíí ì¬ì¼ë¦¬ì§")

RMLANG <- data.frame(RM_EN,RM_DE,RM_RU,RM_CN,RM_JP,RM_KR)
nrm <- NROW(RMLANG)

con <- file("output.txt",open = "a",encoding="UTF-8")
for(i in 1:nrm)
{
cat("English", as.character(RMLANG$RM_EN[i]), file=con,"\n",sep="")
cat("German", as.character(RMLANG$RM_DE[i]), file=con,"\n",sep="")
Sys.setlocale("LC_ALL","Chinese_CHN")
cat("Chinese", as.character(RMLANG$RM_CN[i]), file=con,"\n",sep="")
Sys.setlocale("LC_ALL","Japanese")
cat("Japanese", as.character(RMLANG$RM_JP[i]), file=con,"\n",sep="")
Sys.setlocale("LC_ALL","Korean")
cat("Korean", as.character(RMLANG$RM_KR[i]), file=con,"\n",sep="")
Sys.setlocale("LC_ALL","German")
}
close(con)
""

Dr. Mark Redshaw 
Animal Nutrition Services 
Evonik Degussa GmbH, HN-M-AN, Rodenbacher Chaussee 4, 63457 Hanau, Germany 

Tel: +49 61 81 59 6788 
www.aminoacidsandmore.com 

Prof Brian Ripley <rip...@stats.ox.ac.uk> 
19.05.2010 23:35

An
mark.reds...@evonik.com
Kopie
r-help@r-project.org
Thema
Re: [R] Multiple language output  - Correct in RGui, wrong in .txt after 
sink()

You haven't given us the 'at a minimum' information asked for in the 
the posting guide (but we can guess you are using Windows), nor do we 
know the intended encoding of this email (I see no encoding in the 
header as it reached me, but it seems sensible viewed as UTF-8). And 
the absence of basic information does make it *really* hard to help 
here -- this reply is my third guess at what might be happening.

We also do not know the font you are using in RGui, but I am 
not aware of any Windows font which covers correctly Russian and CJK.
However, it is not just a question of knowing the font name: different 
versions of Windows, including different language-specific versions, 
have different fonts with the same name.

RGui (since about R 2.7.0) works in UCS-2 encoding.  Sink files work 
in the locale's encoding (another of the pieces of information you did 
not tell us, but on Windows it is 8-bit or specific to one of 
Simplified Chinese, Traditional Chinese, Japanese or Korean -- I'd 
guess from your address it was CP1252, but it *is* part of the 'at a 
minimum').  So whereas R can store non-native strings in UTF-8 
(provided you get them in as such), it can only output them if told 
how to: the designer of RGui did so but you in using 
sink('output.txt') did not.

cat+sink is an inefficient way to write to a file: try using the file= 
argument on an opened connection.  And you can set the encoding on 
that connection.  I really don't know what you meant by 'the 
characters as I expect': in a file they have to be in *some* encoding 
and you are not looking at bits but as a representation in some 
unspecified file viewer.  One possibility is that you meant UCS-2 
(what Windows tends incorrectly to call 'Unicode' files), in which 
case you can use something like

con <- file("foo", encoding="UCS-2LE")
cat(..., file=con)
...
close(con)

You can use a connection with sink() too.

Think of it more as a miracle (and much unappreciated hard work and 
inspired design) that any of this works on Windows, and if you want it 
to work transparently, change to an OS with UTF-8 locales (these days, 
just about anything else).

On Wed, 19 May 2010, mark.reds...@evonik.com wrote:

> I have the following problem with outputting multilingual data to a 
file.
> I get (except for Korean) what I expect as result in the RGui, but when 
I
> use sink() to output to a text file loose the characters in the foreign
> languages.
> I post a small example below. Since I am not sure how well my email 
system
> as the list copes with all the different characters I have additionally
> created a pdf version of this example.
> The first part of the example behaves as I expect for all languages 
except
> Korean. I believe that the Korean language may be a problem with the 
font,
> it would be great if someone could confirm this?
> In the second part with output to the txt file I get the <U+FF71> type
> unicode as output not the expected characters. My main problem is how 
can
> I output the characters as I expect?
>
>> RM_EN <- c("Alfalfa hay","Alfalfa meal","Alfalfa silage")
>> RM_DE <- c("Luzerneheu","LurzernegrÃ¼nmehl","Luzernesilage")
>> RM_RU <- c("ÐÑÑÐµÑÐ½Ð¾Ð²Ð¾Ðµ ÑÐµÐ½Ð¾","ÐÑÑÐµÑÐ½Ð¾Ð²Ð°Ñ 
>> ÑÑÐ°Ð²ÑÐ½Ð°Ñ Ð¼ÑÐºÐ°","ÐÑÑÐµÑÐ½Ð¾Ð²ÑÐ¹
> ÑÐµÐ½Ð°Ð¶")
>> RM_CN <- c("èè¿å¹²è","èè¿èç²","èè¿éè´®")
>> RM_JP <- c("ï½±ï¾ï¾ï½§ï¾ï¾ï½§ä¹¾è","ï½±ï¾ï¾ï½§ï¾ï¾ï½§ 
>> ï¾ï½°ï¾","ï½±ï¾ï¾ï½§ï¾ï¾ï½§
> ï½»ï½²ï¾ï½°ï½¼ï¾")
>> RM_KR <- c("ìíí ê±´ì´","ìíí ë°","ìíí ì¬ì¼ë¦¬ì§")
>>
>> RMLANG <- data.frame(RM_EN,RM_DE,RM_RU,RM_CN,RM_JP,RM_KR)
>> nrm <- NROW(RMLANG)
>>
>> for(i in 1:nrm)
> + {
> + cat(format("English",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_EN[i]),"\n",sep="")
> + cat(format("Deutsch",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_DE[i]),"\n",sep="")
> + cat(format("Russian",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_RU[i]),"\n",sep="")
> + cat(format("Japanese",   width = 12, justify = c("left")),
> as.character(RMLANG$RM_JP[i]),"\n",sep="")
> + cat(format("Chinese",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_CN[i]),"\n",sep="")
> + cat(format("Korean",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_KR[i]),"\n","\n","\n",sep="")
> + }
> English     Alfalfa hay
> Deutsch     Luzerneheu
> Russian     ÐÑÑÐµÑÐ½Ð¾Ð²Ð¾Ðµ ÑÐµÐ½Ð¾
> Japanese    ï½±ï¾ï¾ï½§ï¾ï¾ï½§ä¹¾è
> Chinese     èè¿å¹²è
> Korean      ìíí ê±´ì´
>
> English     Alfalfa meal
> Deutsch     LurzernegrÃ¼nmehl
> Russian     ÐÑÑÐµÑÐ½Ð¾Ð²Ð°Ñ ÑÑÐ°Ð²ÑÐ½Ð°Ñ Ð¼ÑÐºÐ°
> Japanese    ï½±ï¾ï¾ï½§ï¾ï¾ï½§ ï¾ï½°ï¾
> Chinese     èè¿èç²
> Korean      ìíí ë°
>
> English     Alfalfa silage
> Deutsch     Luzernesilage
> Russian     ÐÑÑÐµÑÐ½Ð¾Ð²ÑÐ¹ ÑÐµÐ½Ð°Ð¶
> Japanese    ï½±ï¾ï¾ï½§ï¾ï¾ï½§ ï½»ï½²ï¾ï½°ï½¼ï¾
> Chinese     èè¿éè´®
> Korean      ìíí ì¬ì¼ë¦¬ì§
>
>> for(i in 1:nrm)
> + {
> + sink("output.txt")
> + cat(format("English",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_EN[i]),"\n",sep="")
> + cat(format("Deutsch",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_DE[i]),"\n",sep="")
> + cat(format("Japanese",   width = 12, justify = c("left")),
> as.character(RMLANG$RM_JP[i]),"\n",sep="")
> + cat(format("Chinese",    width = 12, justify = c("left")),
> as.character(RMLANG$RM_CN[i]),"\n",sep="")
> + cat(format("Korean",     width = 12, justify = c("left")),
> as.character(RMLANG$RM_KR[i]),"\n","\n","\n",sep="")
> + sink()
> + }
>>
> Output.txt contains:
> ""
> English     Alfalfa hay
> Deutsch     Luzerneheu
> Japanese <U+FF71><U+FF99><U+FF8C><U+FF67><U+FF99><U+FF8C><U+FF67><U+4E7
> Chinese     <U+82DC><U+84FF><U+5E72><U+8349>
> Korean      <U+C54C><U+D314><U+D30C> <U+AC74><U+CD08>
>
> English     Alfalfa meal
> Deutsch     LurzernegrÃ¼nmehl
> Japanese    <U+FF71><U+FF99><U+FF8C><U+FF67><U+FF99><U+FF8C><U+FF67> 
<U+FF
> Chinese     <U+82DC><U+84FF><U+8349><U+7C89>
> Korean      <U+C54C><U+D314><U+D30C> <U+BC15>
>
> English     Alfalfa silage
> Deutsch     Luzernesilage
> Japanese    <U+FF71><U+FF99><U+FF8C><U+FF67><U+FF99><U+FF8C><U+FF67> 
<U+FF
> Chinese     <U+82DC><U+84FF><U+9752><U+8D2E>
> Korean      <U+C54C><U+D314><U+D30C> <U+C0AC><U+C77C><U+B9AC><U+C9C0>
> ""
>
>
>
> many thanks
> Mark Redshaw
> Mark Redshaw
> Animal Nutrition Services
> Evonik Degussa GmbH, HN-M-AN, Rodenbacher Chaussee 4, 63457 Hanau, 
Germany
>
> Tel: +49 61 81 59 6788
> www.aminoacidsandmore.com
>

-- 
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Multiple language output - Correct in RGui, wrong in .txt after sink()

Reply via email to