On Mon, Dec 15, 2014 at 12:33 AM, Spencer Graves <spencer.gra...@prodsyse.com> wrote: > Hello, All: > > > What do people do to strip accents from latin characters, returning > vanilla ASCII?
I find the stringi package works well for this sort of thing, e.g., library(stringi) x <- c("!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", + "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", + ";", "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", + "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", + "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d", + "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", + "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~", "-", " ", + "¡", "¢", "£", "¤", "¥", "¦", "§", "¨", "©", "ª", "«", "¬", "", "®", + "¯", "°", "±", "²", "³", "´", "µ", "¶", "·", "¸", "¹", "º", "»", "¼", + "½", "¾", "¿", "À", "Á", "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É", "Ê", + "Ë", "Ì", "Í", "Î", "Ï", "Ð", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö", "×", "Ø", + "Ù", "Ú", "Û", "Ü", "Ý", "Þ", "ß", "à", "á", "â", "ã", "ä", "å", "æ", + "ç", "è", "é", "ê", "ë", "ì", "í", "î", "ï", "ð", "ñ", "ò", "ó", "ô", + "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ") > cbind(x, stri_trans_general(x, "Latin-ASCII")) Best, Ista > > > For example, I want to convert ‘Raúl’ to “Raul”. Milan (below) > suggested 'iconv(x, “", "ASCII//TRANSLIT”)’. This worked under Windows but > failed on Linux and Mac. It’s part of the “subNonStandardCharacters” > function in the Ecfun package. The development version on R-Forge uses this > and returns “Raul” under Windows and NA under Mac OS X (and something > different from “Raul”, presumably NA, under Linux). > > > Thanks, > Spencer > > >> On Nov 30, 2014, at 2:32 AM, Spencer Graves >> <spencer.gra...@structuremonitoring.com> wrote: >> >> Wonderful. Thanks very much. Spencer >> >> >> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote: >>> Le dimanche 30 novembre 2014 à 02:14 -0800, Spencer Graves a écrit : >>>> Hello: >>>> >>>> >>>> How can one convert Latin characters with to the corresponding >>>> characters without? For example, I want to convert "ú" to "u", similar >>>> to how tolower('U') returns "u". >>>> >>>> >>>> This can be done using chartr{base}, e.g., chartr('ú', 'u', >>>> 'Raúl') returns "Raul". However, I wondered if a simpler version of >>>> this is available. >>> This appears to work: >>>> iconv("ù", "", "ASCII//TRANSLIT") >>> [1] "u" >>> >>> >>> Regards >>> >>>> Thanks, >>>> Spencer >>>> >>>> >>>> p.s. findFn('convert to ascii') found 117 help pages in 70 packages. >>>> A brief review identified two to "Convert to ASCII": ASCIIfy {gtools} >>>> and stri_enc_toascii {stringi}. Neither of these did what I expected. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.