Hi, I'm interested in the suggestion to use writeLines( ..., useBytes=TRUE), but how can I use this function on the way to exporting from R? Could you please provide a simple example?
The following suggestion worked very well: > funny.g<- "\u1E21" > rawstuff<- charToRaw(funny.g) > writeBin(rawstuff, "funny.g.txt") But the function charToRaw() only allows an object with a single character, and writeBin cannot be used to export data frames. Is there any solution along these lines when I have a data frame with Unicode characters? Best Sverre On Fri, Jul 15, 2011 at 2:38 PM, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 15/07/2011 1:42 PM, Sverre Stausland wrote: >> >> >>> >> >>> > funny.g<- "\u1E21" >> >>> > funny.g >> >> >> >> [1] "ḡ" >> >> >> >>> > data.frame (funny.g) -> funny.g >> >>> > funny.g$funny.g >> >> >> >> [1] ḡ >> >> Levels:<U+1E21> >> > >> > I think the problem is in the data.frame code, not in writing. >> > Data.frames >> > try to display things in a readable way, and since you're on Windows >> > where >> > UTF-8 is not really supported, the code helpfully changes that >> > character to >> > the "<U+1E21>" string. for display. >> >> I thought the data.frame function didn't alter the unicode coding, >> since funny.g$funny.g above still displays the right unicode character >> (although it does list the levels as<U+1E21>). >> >> > You should be able to write the Unicode character to file if you use >> > lower >> > level methods such as cat(), on a connection opened using the file() >> > function with the encoding set explicitly. >> >> I'm sorry, but I don't understand what it means "to use cat() on a >> connection opened using the file() function". Could you please clarify >> that? >> > > I just checked on how R does it. We use UTF-8 encodings in the help pages, > regardless of what kind of system you're running on. > > It converts the strings to UTF-8 internally first (your funny.g is already > encoded that way; see Encoding(funny.g)) then uses > > writeLines( ..., useBytes=TRUE) > > to write it. The useBytes argument says not to try to make the file > readable on the local system, just write out the bytes. > > Another way to do it is to get your strings in the UTF-8 encoding, convert > them to raw vectors, and use writeBin() to write those out. For example, > > funny.g<- "\u1E21" > rawstuff<- charToRaw(funny.g) > writeBin(rawstuff, "funny.g.txt") > > > All of this appears hard, because you're thinking of UTF-8 as text, but on > Windows, R thinks of it as a binary encoding. Modern Windows systems can > handle UTF-8, but not all programs on them can. > > Duncan Murdoch > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.