Dear list, I have to read a not-so-small bunch of not-so-small Excel files, which seem to have traversed Window 3.1, Windows95 and Windows NT versions of the thing (with maybe a Mac or two thrown in for good measure...). The problem is that 1) I need to read strings, and 2) those strings may have various encodings. In the same sheet of the same file, some cells may be latin1, some UTF-8 and some CP437 (!).
read.xls() alows me to read those things in sets of dataframes. my problem is to convert the encodings to UTF8 without cloberring those who are already (looking like) UTF8. I came to the following solution : foo<-function(d, from="latin1",to="UTF-8"){ # Semi-smart conversion of a dataframe between charsets. # Needed to ease use of those [EMAIL PROTECTED] Excel files # that have survived the Win3.1 --> Win95 --> NT transition, # usually in poor shape.. conv1<-function(v,from,to) { condconv<-function(v,from,to) { cnv<-is.na(iconv(v,to,to)) v[cnv]<-iconv(v[cnv],from,to) return(v) } if (is.factor(v)) { l<-condconv(levels(v),from,to) levels(v)<-l return(v) } else if (is.character(v)) return(condconv(v,from,to)) else return(v) } for(i in names(d)) d[,i]<-conv1(d[,i],from,to) return(d) } Any advice for enhancement is welcome... Sincerely yours, Emmanuel Charpentier ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.