This is a multi-part message in MIME format. --------------040104050805010601010607 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit
Hi, May this be an oversight? R version 2.6.2 Patched (2008-03-13 r44783) Copyright (C) 2008 The R Foundation for Statistical Computing ISBN 3-900051-07-0 ... > x <- "abä" > Encoding(x) [1] "latin1" > Encoding(gsub("ä","", x)) [1] "unknown" > Encoding(gsub("ä","", x, perl = TRUE)) [1] "latin1" The code in src/main/pcre.c (see also do_tolower and do_strsplit in src/main/character.c) suggests to patch as attached. > x <- "abä" > Encoding(gsub("ä","", x)) [1] "latin1" Happy Easter Christian -- Christian Buchta -> Institute for Tourism and Leisure Studies -> Vienna University of Economics and Business Administration -> Vienna -> Austria -> Europe. Visit us on http://www.wu-wien.ac.at/itf/. --------------040104050805010601010607 Content-Type: text/plain; name="patch_44783" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch_44783" Index: src/main/character.c =================================================================== --- src/main/character.c (revision 44783) +++ src/main/character.c (working copy) @@ -1281,7 +1281,7 @@ strcat(u, t); } while(global && (st = fgrep_one_bytes(spat, s, useBytes)) >= 0); strcat(u, s); - SET_STRING_ELT(ans, i, mkChar(cbuf)); + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i))); Free(cbuf); } } else { @@ -1337,7 +1337,7 @@ for (j = offset ; s[j] ; j++) *u++ = s[j]; *u = '\0'; - SET_STRING_ELT(ans, i, mkChar(cbuf)); + SET_STRING_ELT(ans, i, markKnown(cbuf, STRING_ELT(vec, i))); Free(cbuf); } } --------------040104050805010601010607-- ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel