Perhaps you could contact the persons that supplied/created the file and ask them what the format of the file exactly is. That is probably the safest thing to do.

If you are sure that the lines containing only whitespace are meaningless, then you could alter the previous code to make a copy of the file containing only lines with a length equal to 97 characters (you can do this by changing the '!=' to '==').

Since all lines are then of equal length, I suspect you have fixed width file. You could open and read this file using the LaF package (http://cran.r-project.org/web/packages/LaF/index.html; see the manual vignette for more information). In the package ffbase (http://cran.r-project.org/web/packages/ffbase/index.html) is a function to convert from LaF to ff (laf_to_ffdf). I do not known if packages such as rsqlite or bigmemory can import fixed width files.

The warning message indicates that the last line does not end with a new line character which could indicate an incomplete file but often doesn't mean anything. You could check the last line of the file to be sure.

HTH,

Jan



On 05/05/2012 05:21 AM, iliketurtles wrote:
Your code works!

strangelines.txt was created, and it's a text file with just spacebars ...
Seems like a few thousand lines of complete blanks (not 1 non-blank entry).

One thing, when I ran your code there was an error message;

setwd("C:/Users/admin/Desktop/hons/Thesis")
con<- file("dataset.txt", "rt")
out<- file("strangelines.txt", "wt")
# skip first 5 lines
lines<- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
+     lines<- readLines(con, n=1E5)
+     if (length(lines) == 0) break;
+     strangelines<- lines[nchar(lines) != 97]
+     writeLines(strangelines, con=out)
+ }
Warning message:
In readLines(con, n = 1e+05) : incomplete final line found on 'dataset.txt'




I'm really not sure where to go from here. This has gone way out of my
depth.

-----
----

Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context: 
http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4610446.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to