OK, not all, but most lines have the same length. Perhaps you could write the lines with a different line size to a separate file to have a closer look at those lines. Modifying the previous code (again not tested):

con <- file("dataset.txt", "rt")
out <- file("strangelines.txt", "wt")
# skip first 5 lines
lines <- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
   lines <- readLines(con, n=1E5)
   if (length(lines) == 0) break;
   strangelines <- lines[nchar(lines) != 97]
   writeLines(strangelines, con=out)
}
close(con)
close(out)

Jan



Quoting iliketurtles <isaacm...@gmail.com>:

Jan, thank you.

table(line_sizes)
line_sizes
       0        1       97      256
    1430     2860 46869069     1430

-----
----

Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context: http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4608172.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to