Hello, I use:
R version 2.9.2 (2009-08-24) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 on Ubuntu 9.10, I usually run R from ESS (5.4 on current Unbuntu) from Emacs-22.2.1. But I also tried the following from the console and it gave the same results. I have a data file containing lots of European characters, French, German, Italian and so on. I can read it ok in R but I can't display the characters correctly. I searched the archives and following professor Ripley's advice I read my data the following way: > con <- file("/home/gerald/Vins/ListeVin091123.csv", open = "r", encoding = "UTF-8") > isOpen(con) [1] TRUE > ttt <- read.table(file = con, header = TRUE, sep = ";", quote = "\"'", + dec = ",", # row.names, col.names, + na.strings = "", colClasses = NA, nrows = -1, + skip = 0, check.names = TRUE, + strip.white = FALSE, blank.lines.skip = TRUE, + comment.char = "#", + allowEscapes = FALSE, flush = FALSE, + stringsAsFactors = FALSE) > close(con) It seems that R does recognize the locales since it tries to report errors in French here is a simple example: > ttt.g <- "gérald" Erreur : caractères multioctets incorrects dans l'analyse de code (parser) à la ligne 1 outputting the colnames of my data set I get: > names(ttt) [1] "ID" "Domaine" "Nom" "MillÃÆÃ.sime" "Pays" [6] "RÃÆÃ.gion" "Appellation" "Vignoble" "Couleur" "Alcool" [11] "Classement" "Cuve" "mois" "Bio" "CÃÆÃ.page..1" [16] "X." "CÃÆÃ.page..2" "X..1" "CÃÆÃ.page..3" "X..2" [21] "CÃÆÃ.page..4" "X..3" "CÃÆÃ.page..5" "X..4" "Prix" [26] "QuantitÃÆÃ." "Internet" sessionInfo yields the following: > sessionInfo() R version 2.9.2 (2009-08-24) i486-pc-linux-gnu locale: LC_CTYPE=fr_CA.UTF-8;LC_NUMERIC=C;LC_TIME=fr_CA.UTF-8;LC_COLLATE=fr_CA.UTF-8;LC_MONETARY=C; LC_MESSAGES=fr_CA.UTF-8;LC_PAPER=fr_CA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C; LC_MEASUREMENT=fr_CA.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Revobase_0.2-1 I tried to play with Emacs' coding systems with no luck! Any idea on how to handle this? My ultimate goal is to clean up and sort this data set and then export it in a LaTeX compatible format. By the way, if I open the file with OpenOffice Calc it asks me to confirm that the encoding is Unicode UTF-8, I do, change the default delimiter to ";" and press enter. All the accented characters display OK. Thanks for any insights, Gérald Jean ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.