Hi all,I would like to propose the attached function ASCIIfy() to be added to the 'tools' package.
Non-ASCII characters in character vectors can be problematic for R packages, but sometimes they cannot be avoided. To make packages portable and build without 'R CMD check' warnings, my solution has been to convert problematic characters in functions and datasets to escaped ASCII, so plot(1,main="São Paulo") becomes plot(1,main="S\u00e3o Paulo").
The showNonASCII() function in package:tools is helpful to identify R source files where characters should be converted to ASCII one way or another, but I could not find a function to actually perform the conversion to ASCII.
I have written the function ASCIIfy() to convert character vectors to ASCII. I imagine other R package developers might be looking for a similar tool, and it seems to me that package:tools is the first place they would look, where the R Core Team has provided a variety of tools for handling non-ASCII characters in package development.
I hope the R Core Team will adopt ASCIIfy() into the 'tools' package, to make life easier for package developers outside the English-speaking world. I have of course no problem with them renaming or rewriting the function in any way.
See the attached examples - all in flat ASCII that was prepared using the function itself! The main objective, though, is to ASCIIfy functions and datasets, not help pages.
Arni
ASCIIfy <- function(string, bytes=2, fallback="?") { bytes <- match.arg(as.character(bytes), 1:2) convert <- function(char) # convert to ASCII, e.g. "z", "\xfe", or "\u00fe" { raw <- charToRaw(char) if(length(raw)==1 && raw<=127) # 7-bit ascii <- char else if(length(raw)==1 && bytes==1) # 8-bit to \x00 ascii <- paste0("\\x", raw) else if(length(raw)==1 && bytes==2) # 8-bit to \u0000 ascii <- paste0("\\u", chartr(" ","0",formatC(as.character(raw),width=4))) else if(length(raw)==2 && bytes==1) # 16-bit to \x00, if possible if(utf8ToInt(char) <= 255) ascii <- paste0("\\x", format.hexmode(utf8ToInt(char))) else { ascii <- fallback; warning(char, " could not be converted to 1 byte")} else if(length(raw)==2 && bytes==2) # UTF-8 to \u0000 ascii <- paste0("\\u", format.hexmode(utf8ToInt(char),width=4)) else { ascii <- fallback warning(char, " could not be converted to ", bytes, " byte")} return(ascii) } if(length(string) > 1) { sapply(string, ASCIIfy, bytes=bytes, fallback=fallback, USE.NAMES=FALSE) } else { input <- unlist(strsplit(string,"")) # "c" "a" "f" "<\'e>" output <- character(length(input)) # "" "" "" "" for(i in seq_along(input)) output[i] <- convert(input[i]) # "c" "a" "f" "\\u00e9" output <- paste(output, collapse="") # "caf\\u00e9" return(output) } }
\name{ASCIIfy} \alias{ASCIIfy} \title{Convert Characters to ASCII} \description{ Convert character vector to ASCII, replacing non-ASCII characters with single-byte (\samp{\x00}) or two-byte (\samp{\u0000}) codes. } \usage{ ASCIIfy(x, bytes = 2, fallback = "?") } \arguments{ \item{x}{a character vector, possibly containing non-ASCII characters.} \item{bytes}{either \code{1} or \code{2}, for single-byte (\samp{\x00}) or two-byte (\samp{\u0000}) codes.} \item{fallback}{an output character to use, when input characters cannot be converted.} } \value{ A character vector like \code{x}, except non-ASCII characters have been replaced with \samp{\x00} or \samp{\u0000} codes. } \author{Arni Magnusson.} \note{ To render single backslashes, use these or similar techniques: \verb{ write(ASCIIfy(x), "file.txt") cat(paste(ASCIIfy(x), collapse="\n"), "\n", sep="")} The resulting strings are plain ASCII and can be used in R functions and datasets to improve package portability. } \seealso{ \code{\link[tools]{showNonASCII}} identifies non-ASCII characters in a character vector. } \examples{ cities <- c("S\u00e3o Paulo", "Reykjav\u00edk") print(cities) ASCIIfy(cities, 1) ASCIIfy(cities, 2) athens <- "\u0391\u03b8\u03ae\u03bd\u03b1" print(athens) ASCIIfy(athens) } \keyword{}
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel