Perhaps you could contact the persons that supplied/created the file and
ask them what the format of the file exactly is. That is probably the
safest thing to do.
If you are sure that the lines containing only whitespace are
meaningless, then you could alter the previous code to make a copy of
the file containing only lines with a length equal to 97 characters (you
can do this by changing the '!=' to '==').
Since all lines are then of equal length, I suspect you have fixed width
file. You could open and read this file using the LaF package
(http://cran.r-project.org/web/packages/LaF/index.html; see the manual
vignette for more information). In the package ffbase
(http://cran.r-project.org/web/packages/ffbase/index.html) is a function
to convert from LaF to ff (laf_to_ffdf). I do not known if packages such
as rsqlite or bigmemory can import fixed width files.
The warning message indicates that the last line does not end with a new
line character which could indicate an incomplete file but often doesn't
mean anything. You could check the last line of the file to be sure.
HTH,
Jan
On 05/05/2012 05:21 AM, iliketurtles wrote:
Your code works!
strangelines.txt was created, and it's a text file with just spacebars ...
Seems like a few thousand lines of complete blanks (not 1 non-blank entry).
One thing, when I ran your code there was an error message;
setwd("C:/Users/admin/Desktop/hons/Thesis")
con<- file("dataset.txt", "rt")
out<- file("strangelines.txt", "wt")
# skip first 5 lines
lines<- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
+ lines<- readLines(con, n=1E5)
+ if (length(lines) == 0) break;
+ strangelines<- lines[nchar(lines) != 97]
+ writeLines(strangelines, con=out)
+ }
Warning message:
In readLines(con, n = 1e+05) : incomplete final line found on 'dataset.txt'
I'm really not sure where to go from here. This has gone way out of my
depth.
-----
----
Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context:
http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4610446.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.