OK, not all, but most lines have the same length. Perhaps you could
write the lines with a different line size to a separate file to have
a closer look at those lines. Modifying the previous code (again not
tested):
con <- file("dataset.txt", "rt")
out <- file("strangelines.txt", "wt")
# skip first 5 lines
lines <- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
lines <- readLines(con, n=1E5)
if (length(lines) == 0) break;
strangelines <- lines[nchar(lines) != 97]
writeLines(strangelines, con=out)
}
close(con)
close(out)
Jan
Quoting iliketurtles <isaacm...@gmail.com>:
Jan, thank you.
table(line_sizes)
line_sizes
0 1 97 256
1430 2860 46869069 1430
-----
----
Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context:
http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4608172.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.