Re: [R] Can't import this 4GB DATASET

David Winsemius Fri, 04 May 2012 07:48:24 -0700


On May 4, 2012, at 1:34 AM, iliketurtles wrote:

Dear Experienced R Practitioners,
I have 4GB .txt data called "dataset.txt" and have attempted to use*ff,
bigmemory, filehash and sqldf *packages to import it, but have had no
success. The readLines output of this data is:

Ther alignment of that output makes me wonder if the file is tab-speparated. You have considered the possibility that tab is theseparator but have you actually tried using sep = "\t" in your readoperations?


--
David.

readLines("dataset.txt",n=20)
[1] " "
[2] "
"
[3] " "
[4] "  PERMNO          DATE    SHRCD    COMNAM
PRC           VOL"
[5] ""
[6] "   10001    01/09/1986     11      GREAT FALLS GAS CO
-5.75000         14160"
[7] "   10001    01/10/1986     11      GREAT FALLS GAS CO
-5.87500             0"
[8] "   10001    01/13/1986     11      GREAT FALLS GAS CO
-5.87500          2805"
[9] "   10001    01/14/1986     11      GREAT FALLS GAS CO
[20] "   10001    01/29/1986     11      GREAT FALLS GAS CO
-6.06250          4600"
This data goes on for a huge number of rows (not sure exactly howmany).Each element in each row is separated by and uneven number of (whatseem tobe) spaces (maybe TAB? not sure). Further, there are some rows thatare
"incomplete", i.e. there's missing elements.
Take the first 29 rows of "dataset.txt" into a separate data file,let'scall it "dataset2.txt". read.table("dataset2.txt",skip=5) gives theperfecttable that I want to end up with, except I want it with the 4GB datathrough
bigmemory, ff or filehash.


snipped several failed attempts

NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA


#Even worse.
###/*MY ATTEMPT USING sqldf*/###
No idea what to do here.

-----


David Winsemius, MD

West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't import this 4GB DATASET

Reply via email to