On 04/03/2010 11:40 PM, David Winsemius wrote:
On Mar 4, 2010, at 10:58 PM, Duncan Murdoch wrote:
On 04/03/2010 10:32 PM, David Winsemius wrote:
On Mar 4, 2010, at 9:47 PM, jonas garcia wrote:
When I opened the file with a hex-editor, the problematic
character turned out to be “1a”
I am attaching a sample DAT file with 3 lines (the second line is
the one with the undesirable character).
The furthest I could get was through readBin:
tmp<- readBin("new.dat", what = "raw", n=100000000)
[1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31
31 34 2c 2d 32 30 2e 38 33 36 2c 31
[33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a
33 35 3a 33 35 2c 20 34 34 33 32 2c
[65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e
38 2c 1a 30 2e 38 31 2c 31 31 35 37
[97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37
2e 31 31 34 2c 2d 32 30 2e 38 33 36
[129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38
tmp[87]
[1] 1a
I got a different "interpretation" of that character when I let R
look at it. And I cannot figure out why \032 should be causing
problems??? :
Hex 1a and octal 032 both correspond to Ctrl-Z, which is the MSDOS
EOF marker. I forget whether R's text reading routines pay
attention to that, or whether it's the C runtime, but it makes sense
that it would cause problems on Windows.
Duncan Murdoch
Thanks. I was interpreting \032 as decimal, so couldn't figure out why
it should equal 0x1A. You've explained the basis (or base) of my
confusion.
By the way, here's one way to remove the bad char. Read it using
readBin as above, then
tmp <- tmp[tmp != 0x1a]
to remove the bad chars, or
tmp[tmp == 0x1a] <- charToRaw(" ")
to replace them with spaces. Then write the tmp vector out to a file
with writeBin.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.