I have very little knowledge about file encodings and would like to learn more.
I've read the following pages to learn more: http://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html https://stackoverflow.com/questions/4806823/how-to-detect-the-right-encoding-for-read-csv https://developer.r-project.org/Encodings_and_R.html The last one, in particular, has been very helpful. I would be interested in any further references that you suggest. I attach a file that reproduces the issue I would like to learn more about. I do not know if the file encoding will be correctly preserved through email, so I also provide the file (temporarily) on Dropbox here: https://www.dropbox.com/s/3lbgebk7b5uaia7/encoding_export_issue.R?dl=0 The file gives an error when using "source()" with the argument echo = TRUE: > source("encoding_export_issue.R", echo = TRUE) Error in nchar(dep, "c") : invalid multibyte string, element 1 In addition: Warning message: In grepl("^[[:blank:]]*$", dep[1L]) : input string 1 is invalid in this locale The problem comes from the "รก" character in the .R file. The file appears to be encoded as "iso-8859-1": $ file --mime-encoding encoding_export_issue.R encoding_export_issue.R: iso-8859-1 Note that for me: > getOption("encoding") [1] "native.enc" so "native.enc" is used for the "encoding" argument of source(). The following two calls succeed: > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown") > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1") Is this file a valid "iso-8859-1" encoded file? Why does source() fail in the case of encoding set to "native.enc"? Is it because of the settings to UTF-8 in my locale (see info on my system at the bottom of this email). I'm guessing it would be a bad idea to put options(encoding = "unknown") in my .Rprofile, because it is difficult to always correctly guess the encoding of files? Is there a reason why setting it to "unknown" would lead to more problems than leaving it set to "native.enc"? I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below is my session info and locale info for my system with the 3.4.3 version: > sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.3 > Sys.getlocale() [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" Thanks for your time, Scott P.S. Note that I had posted this question to r-devel, which was the incorrect choice. For archival purposes, I reference the thread here: https://www.mail-archive.com/search?l=mid&q=20180501185750.445oub53vcdnyyyx%40steph -- Scott Kostyshak Assistant Professor of Economics University of Florida https://people.clas.ufl.edu/skostyshak/
# Ch?vez quantile_type <- 4
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.