R-developers, I'm looking for some 'best practices', or perhaps an upstream solution (I have a deja vu about this, so sorry if it's already been asked). Problems occur when a file is encoded as latin1, but the user has a UTF-8 locale (or I guess more generally when the input locale does not match R's). Here are two examples from the Bioconductor help list:
https://stat.ethz.ch/pipermail/bioconductor/2007-August/018947.html (the relevant command is library(GEOquery); gse <- getGEO('GSE94')) https://stat.ethz.ch/pipermail/bioconductor/2007-July/018204.html I think solutions are: * Specify the encoding in readLines. * Convert the input using iconv. * Tell the user to set their locale to match the input file (!) Unfortunately, these (1 & 2, anyway) place extra burden on the package author, to become educated about locales, the encoding conventions of the files they read, and to know how R deals with encodings. Are there other / better solutions? Any chance for some (additional) 'smarts' when reading files? Martin -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel