On 04/03/2010 11:40 PM, David Winsemius wrote:
On Mar 4, 2010, at 10:58 PM, Duncan Murdoch wrote:

On 04/03/2010 10:32 PM, David Winsemius wrote:
On Mar 4, 2010, at 9:47 PM, jonas garcia wrote:
When I opened the file with a hex-editor, the problematic character turned out to be “1a” I am attaching a sample DAT file with 3 lines (the second line is the one with the undesirable character).

The furthest I could get was through readBin:

tmp<- readBin("new.dat", what = "raw", n=100000000)
[1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 [33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a 33 35 3a 33 35 2c 20 34 34 33 32 2c [65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e 38 2c 1a 30 2e 38 31 2c 31 31 35 37 [97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36
[129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38


tmp[87]
[1] 1a
I got a different "interpretation" of that character when I let R look at it. And I cannot figure out why \032 should be causing problems??? :
Hex 1a and octal 032 both correspond to Ctrl-Z, which is the MSDOS EOF marker. I forget whether R's text reading routines pay attention to that, or whether it's the C runtime, but it makes sense that it would cause problems on Windows.

Duncan Murdoch

Thanks. I was interpreting \032 as decimal, so couldn't figure out why it should equal 0x1A. You've explained the basis (or base) of my confusion.

By the way, here's one way to remove the bad char. Read it using readBin as above, then

tmp <- tmp[tmp != 0x1a]

to remove the bad chars, or

tmp[tmp == 0x1a] <- charToRaw(" ")

to replace them with spaces. Then write the tmp vector out to a file with writeBin.

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to