Hello, as Jan pointed out the problem is with the encoding in which R saves the fucntion. If I set this encoding to "UTF-8" in source everything is fine.
If I go either in my .bash_profile or my .Renviron file and set all LOCALE variables to "fr_CA.UTF8" it should do the job, and to a certain point it does, I can source, and save in my personnal library functions with multibyte characters and they will run as expected. BUT with these settings at startup R throws the following error: Erreur : caractères multioctets incorrects dans l'analyse de code (parser) à la ligne 28 which translates in something like: Error: incorrect multi-byte characters in the code analysis (parser) at line 28 Further more I can't install any package, install.packages returns the same error and stops execution??? I know the work around is to not specify an UTF-8 locale in my profiles and explicitly pass the argument "encoding = 'UTF-8'" to source. But to me, this is somewhat of an inconsistency!!! Thanks to Jan for his insights, Gérald (Embedded image moved to file: pic09232.gif) Gerald Jean, M. Sc. en statistiques Conseiller senior en statistiques Lévis (siège social) Actuariat corporatif, 418 835-4900, poste Modélisation et Recherche 7639 Assurance de dommages 1 877 835-4900, poste Mouvement Desjardins 7639 Télécopieur : 418 835-6657 Faites bonne impression et imprimez seulement au besoin! Ce courriel est confidentiel, peut être protégé par le secret professionnel et est adressé exclusivement au destinataire. Il est strictement interdit à toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. Merci. Jan van der Laan <rh...@eoos.dds.n l> A r-help@r-project.org 2013/11/27 02:26 cc gerald.j...@dgag.ca Objet Re: [R] Coding systems. Could it be that your r-script is saved in a different encoding than the one used by R (which will probably be UTF8 since you're working on linux)? -- Jan gerald.j...@dgag.ca schreef: > Hello, > > I am using R, 2.15.2, on a 64-bit Linux box. I run R through Emacs' ESS. > > R runs in a French, Canadian-French, locale and lately I got surprising > results > from functions making factor variables from character variables. Many of > the > variables in input data.frames are character variables and contain latin > accents, for exemple the "é" in "Montréal". I waisted several days playing > with coding systems and trying to understand why some code when run one > command at > a time from the command line gives the expected result while when cut and > pasted in a function it doesn't??? > > For example the following code: > > ============================================================================== > ttt.rmr <- sima.31122012$rmrnom > ttt.rmr.2 <- ifelse (ttt.rmr %in% c("Edmonton", "Edmundston", > "Charlottetown", "Calgary", "Winnipeg", > "Victoria", "Vancouver", "Toronto", > "St. John's", "Saskatoon", "Regina", > "Québec", "Ottawa - Gatineau (Ontario", > "Ottawa - Gatineau (partie", > "Montréal", > "Halifax", "Fredericton"), > "Grandes villes", ifelse(ttt.rmr == "", "Manquant", > "Autres")) > unique(ttt.rmr.2) > ttt.rmr.2 <- factor(ttt.rmr.2, levels = c("Grandes villes", "Autres", > "Manquant"), > labels = c("Grandes villes", "Autres", "Manquant")) > > ============================================================================== > > will have "Montréal" and "Québec" in the "Grandes villes" level of the > factor > variable, while running the same code in a function will have them in > "Autres". > The variable "rmr.Merged" in the data.frame "test2.sima.31122012.DataPrep" > is > the output of the function, which, of course, does a lot of other stuff. > > ============================================================================== > ttt.w <- which(ttt.rmr.2 != test2.sima.31122012.DataPrep$rmr.Merged) > frequence(test2.sima.31122012.DataPrep$rmrnom[ttt.w]) > Frequency Percent Cum.Freq Cum.Percent > Montréal 1301254 79.57173 1301254 79.57173 > Québec 334068 20.42827 1635322 100.00000 > ============================================================================== > > All other city names, no accents, were correctly classified but "Montréal" > and > "Québec", together they represent over 1.5M records, not negligeable!!! > > Following is my ".Renviron" file where I set up environment variables for > R. > > R_PROFILE_USER="/home/jeg002/MyRwork/StartUp/profile.R" > # export R_PROFILE_USER > R_HISTFILE="/home/jeg002/MyRwork/.Rhistory" > ## Default editor > EDITOR=${EDITOR-${VISUAL-'/usr/local/bin/emacsclient'}} > ## Default pager > PAGER=${PAGER-'/usr/local/bin/emacsclient'} > > ## Setting locale, hoping it will be OK "all" the time!!! > LANG=fr_CA > LANGUAGE=fr_CA > LC_ADDRESS=fr_CA > LC_COLLATE=fr_CA > LC_TYPE=fr_CA > LC_IDENTIFICATION=fr_CA > LC_MEASUREMENT=fr_CA > LC_MESSAGES=fr_CA > LC_NAME=fr_CA > LC_PAPER=en_US > LC_NUMERIC=en_US > LC_TELEPHONE=fr_CA > LC_MONETARY=fr_CA > LC_TIME=fr_CA > R_PAPERSIZE='letter' > ============================================================================== > > and: > >> Sys.getlocale() > [1] > "LC_CTYPE=fr_CA;LC_NUMERIC=C;LC_TIME=fr_CA;LC_COLLATE=fr_CA;LC_MONETARY=fr_CA;LC_MESSAGES=fr_CA;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_CA;LC_IDENTIFICATION=C" > >> Sys.getenv(c("LANGUAGE", "LANG")) > LANGUAGE LANG > "fr_CA" "fr_CA" > > I must be missing something!!! Maybe someone can make sense of this!!! > Thanks > for your support, > > Gérald Jean > > (Embedded image moved to file: > pic06023.gif) > > Gerald Jean, M. Sc. en statistiques > Conseiller senior en statistiques Lévis (siège social) > > Actuariat corporatif, 418 835-4900, poste > Modélisation et Recherche 7639 > Assurance de dommages 1 877 835-4900, poste > Mouvement Desjardins 7639 > Télécopieur : 418 > 835-6657 > > > > > Faites bonne impression et imprimez seulement au besoin! > > Ce courriel est confidentiel, peut être protégé par le secret > professionnel et > est adressé exclusivement au destinataire. Il est strictement > interdit à toute > autre personne de diffuser, distribuer ou reproduire ce message. Si > vous l'avez > reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur. > Merci.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.