Oops, this was supposed to be a private reply ;) - sorry about the noise. The essence in English: JGR uses all strings in UTF-8 encoding, but the system locale reports CP1252 which impedes automatic conversions (because R doesn't know that everything is UTF-8). Specific conversion via iconv works as expected (see the example below).
Cheers, Simon On Dec 29, 2007, at 11:11 AM, Simon Urbanek wrote: > Hallo Matthias, > > On Dec 27, 2007, at 3:52 PM, Matthias Wendel wrote: > >> Hi, simon, >> i followed your advice by adding/changing the lines >> abt = iconv(abt,"utf-8","latin1") >> zz = file( paste("Itemtabelle/Itemtabelle", abt, ".html"), "wt", >> encoding = "latin1") >> but this yielded the same results. > > Ich habe endlich eine Windows-Maschine zum Testen und bei mir wird der > Dateiname richtig angelegt ... > > Dennoch, anscheinend stimmt die locale nicht - denn JGR benutzt immer > UTF-8, aber das System liefert CP1252. Deswegen scheint die > automatische Konvertierung nicht zu funktionieren > (file(...,encoding..)). Was allerding immer geht, ist die explizite > Konvertierung: > > a=file("foo","wt") > writeLines(iconv(..., "utf-8","latin1"),a) > close(a) > > (FWIW: da die empfohlene Kodierung von Webseiten sowieso UTF-8 ist, > braucht man es eigentlich nicht wirklich ... ;)) > > charToRaw ist immer eine guter Test, weil UTF-8 fuer Umlaute meist 2- > bytes bracht und latin1 nur eins. > > Viele Gruesse, > Simon > > >> -----Ursprüngliche Nachricht----- >> Von: Simon Urbanek [mailto:[EMAIL PROTECTED] >> Gesendet: Donnerstag, 27. Dezember 2007 21:40 >> An: Matthias Wendel >> Cc: r-devel@r-project.org >> Betreff: Re: [Rd] encoding question again >> >> Matthias, >> >> you get exactly what you specified - namely UTF-8. If you want your >> html file to be latin1, then you have to say so: >> >> zz = file( paste("Itemtabelle/Itemtabelle", abt, ".html"), "wt", >> encoding = "latin1") >> >> In addition, you're assuming that `abt' is in the correct encoding >> to be understood by your OS. If it's not, you better convert it into >> one. >> From your results it seems as if `abt' is also UTF-8 encoded. Since >> you didn't tell us where you got that from, you should either fix >> the source or use something like iconv(abt,"utf-8","latin1"): >> >> (in UTF-8 locale) >>> abt="nür" >>> cat(abt,"\n") >> nür >>> charToRaw(abt) >> [1] 6e c3 bc 72 >>> charToRaw(iconv(abt,"utf-8","latin1")) >> [1] 6e fc 72 >> >> Cheers, >> Simon >> >> >> On Dec 27, 2007, at 3:11 PM, Matthias Wendel wrote: >> >>> Hi, R Devils, >>> I'm running the actual R version in JGR (version 1.5-8 ). >>> Sys.getlocale(category = "LC_ALL") yields [1] >>> "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany. >>> 1252;LC_MONETARY=German_Germany. >>> 1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252" >>> >>> I want to write some HTML-Code enhanced by statistical results and >>> labels encoded in Latin-1, which I pass to a function. Some label >>> shall generate the filename. Although the labels are correctly >>> handled >>> in JGR they are somehow converted when they are written to the file. >>> Also the filename is not constructed as wanted. The function >>> definition is correctly sourced into R. The function is defined like >>> this: >>> >>> Itemtabelle.head <- function (abt ){ >>> # nür zöm TÄST >>> zz = file( paste("Itemtabelle/Itemtabelle", abt, ".html"), "wt", >>> encoding = "UTF-8") >>> cat(as.character("<html >>> xmlns:o=\"urn:schemas-microsoft-com:office:office >>> \" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" >>> xmlns=\"http://www.w3.org/TR/REC-html40 >>> \"> \n"), >>> as.character(" >>> < >>> head >>>> >>> >>> \n "), >>> . >>> . >>> . >>> as.character(" <td colspan=5 class=xl28 width=727 style= >>> \'width:545pt\'>Gesundheitsindikatoren: "), abt, as.character("</ >>> td> \n"), >>> as.character(" </ >>> tr >>>> >>> >>> "), file = zz) >>> close(zz) >>> unlink(zz) >>> } >>> Setting abt as " Ärzte Innere, Gynäkologie" and calling the function >>> with this argument, yields a filename "Itemtabelle Ärzte Innere, >>> Gynäkologie .html" and in the file a line >>> <td colspan=5 class=xl28 width=727 style='width: >>> 545pt'>Gesundheitsindikatoren: Ärzte Innere, Gynäkologie </ >>> td> >>> is generated. . >>> I tried to solve this by using iconv, without success. >>> The problem remains the same in the rgui and rterm - in rterm the >>> resulting filename is "Itemtabelle Žrzte Innere, >>> Gyn„kologie .html". >>> >>> Cheers, >>> Matthias >>> >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel