This should work for you:
input <- file('/recv/new.dat', 'rb') output <- file('/recv/newV2.dat', 'wb') repeat { x <- readBin(input, what='raw', n=10000) if (length(x) == 0) break x[which(x == as.raw(0x1a))] <- charToRaw(' ') writeBin(x, output) } close(input) close(output) On Thu, Mar 4, 2010 at 9:47 PM, jonas garcia <garcia.jona...@googlemail.com> wrote: > When I opened the file with a hex-editor, the problematic character turned > out to be “1a” > > I am attaching a sample DAT file with 3 lines (the second line is the one > with the undesirable character). > > > > The furthest I could get was through readBin: > > > >> tmp<- readBin("new.dat", what = "raw", n=100000000) > > [1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31 31 34 2c > 2d 32 30 2e 38 33 36 2c 31 > > [33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a 33 35 3a > 33 35 2c 20 34 34 33 32 2c > > [65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e 38 2c 1a > 30 2e 38 31 2c 31 31 35 37 > > [97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37 2e 31 31 > 34 2c 2d 32 30 2e 38 33 36 > > [129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38 > > > > > >> tmp[87] > > [1] 1a > > > > The idea now is as Jim suggested, replace “1a” by (for example) “20” in the > raw format and write the file back with > > writeBin(tmp, "new2.dat") > > > > Can I use gsub? How can I perform this operation without messing around with > the raw format? > > > > Thanks > > J > > > > > On Thu, Mar 4, 2010 at 8:35 PM, jim holtman <jholt...@gmail.com> wrote: >> >> Have you considered reading the file in a binary/raw, finding the >> offending character and replacing it with a blank (or whatever and >> then writing the file back out). You can then probably process it >> using read.table.; >> >> On Thu, Mar 4, 2010 at 12:50 PM, jonas garcia >> <garcia.jona...@googlemail.com> wrote: >> > Thank you so much for your reply. >> > >> > >> > >> > I can identify the characters very easily in a couple of files. The >> > reason I >> > am worried is that I have thousands of files to read in. The files were >> > produced in a very old MS-DOS software that records information on >> > oceanographic data and geographic position during a survey. >> > >> > >> > >> > My main goal is read all these files into R for further analysis. Most >> > of >> > the files are cleared of these EOL markers but some are not. I only >> > noticed >> > the problem by chance when I was looking and comparing one of them. I >> > wonder >> > if I can solve this problem using R, without having to go for text >> > editors >> > separately. >> > >> > >> > >> > Help on this would be much appreciated. >> > >> > Thanks again >> > >> > >> > >> > J >> > >> > >> > On 3/4/10, David Winsemius <dwinsem...@comcast.net> wrote: >> >> >> >> >> >> On Mar 3, 2010, at 2:22 PM, jonas garcia wrote: >> >> >> >> Dear R users, >> >>> >> >>> I am trying to read a huge file in R. For some reason, only a part of >> >>> the >> >>> file is read. When I further investigated, I found that in one of my >> >>> non-numeric columns, there is one odd character responsible for this, >> >>> which >> >>> I reproduce bellow: >> >>> In case you cannot see it, it looks like a right arrow, but it is not >> >>> the >> >>> one you get from microsoft word in menu "insert symbol". >> >>> >> >>> I think my dat file is broken and that funny character is an EOL >> >>> marker >> >>> that >> >>> makes R not read the rest of the file. I am sure the character is >> >>> there by >> >>> chance but I fear that it might be present in some other big files I >> >>> have >> >>> to >> >>> work with as well. So, is there any clever way to remove this >> >>> inconvenient >> >>> character in R avoiding having to edit the file in notepad and remove >> >>> it >> >>> manually? >> >>> >> >>> Code I am using: >> >>> >> >>> read.csv("new3.dat", header=F) >> >>> >> >>> Warning message: >> >>> In read.table(file = file, header = header, sep = sep, quote = quote, >> >>> : >> >>> incomplete final line found by readTableHeader on 'new3.dat' >> >>> >> >> >> >> I think you should identify the offending line by using the >> >> count.fields >> >> function and fix it with an editor. >> >> >> >> >> >> -- >> >> David >> >> >> >>> >> >>> I am working with R 2.10.1 in windows XP. >> >>> >> >>> Thanks in advance >> >>> >> >>> Jonas >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> ______________________________________________ >> >>> R-help@r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >>> >> >>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> >>> and provide commented, minimal, self-contained, reproducible code. >> >>> >> >> >> >> David Winsemius, MD >> >> Heritage Laboratories >> >> West Hartford, CT >> >> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.