[R] embedded nuls in 2.10 versus 2.11

Brandon Whitcher Tue, 02 Mar 2010 00:36:00 -0800

I have been reading binary files, and parsing the output, for some
time now.  I have tried to develop a technique that is as robust as
possible to all the strange things that appear in text fields, not to
mention different global/regional encodings.  I have no control over
the data generated by users, so I would like to be as flexible and
accommodating as possible.  The following code is straightforward, but
will fail with embedded nuls in R <= 2.10


fid = open(filename, "rb")
readChar(fid, n=10)
close(fid)

Previous suggestions from the R-help list led me to consider

fid = open(filename, "rb")
rawToChar(readBin(fid, "raw", 10))
close(fid)

or even

fid = open(filename, "rb")
iconv(rawToChar(readBin(fid, "raw", 10)), to="UTF-8")
close(fid)

to ensure that my output is "well behaved".  With the new error
handling in rawToChar() in R = 2.11, embedded nuls are no longer
allowed except at the end of the string.  I run across these all the
time in my user data.  How can I recover as much of the text as
possible when reading in from a binary file with embedded nuls in R >=
2.11 and keep the code backwards compatible with R < 2.11?

thanks...

Brandon

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] embedded nuls in 2.10 versus 2.11

Reply via email to