I have been reading binary files, and parsing the output, for some time now. I have tried to develop a technique that is as robust as possible to all the strange things that appear in text fields, not to mention different global/regional encodings. I have no control over the data generated by users, so I would like to be as flexible and accommodating as possible. The following code is straightforward, but will fail with embedded nuls in R <= 2.10
fid = open(filename, "rb") readChar(fid, n=10) close(fid) Previous suggestions from the R-help list led me to consider fid = open(filename, "rb") rawToChar(readBin(fid, "raw", 10)) close(fid) or even fid = open(filename, "rb") iconv(rawToChar(readBin(fid, "raw", 10)), to="UTF-8") close(fid) to ensure that my output is "well behaved". With the new error handling in rawToChar() in R = 2.11, embedded nuls are no longer allowed except at the end of the string. I run across these all the time in my user data. How can I recover as much of the text as possible when reading in from a binary file with embedded nuls in R >= 2.11 and keep the code backwards compatible with R < 2.11? thanks... Brandon ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.