On May 4, 2012, at 1:34 AM, iliketurtles wrote:
Dear Experienced R Practitioners,
I have 4GB .txt data called "dataset.txt" and have attempted to use
*ff,
bigmemory, filehash and sqldf *packages to import it, but have had no
success. The readLines output of this data is:
Ther alignment of that output makes me wonder if the file is tab-
speparated. You have considered the possibility that tab is the
separator but have you actually tried using sep = "\t" in your read
operations?
--
David.
readLines("dataset.txt",n=20)
[1] " "
[2] "
"
[3] " "
[4] " PERMNO DATE SHRCD COMNAM
PRC VOL"
[5] ""
[6] " 10001 01/09/1986 11 GREAT FALLS GAS CO
-5.75000 14160"
[7] " 10001 01/10/1986 11 GREAT FALLS GAS CO
-5.87500 0"
[8] " 10001 01/13/1986 11 GREAT FALLS GAS CO
-5.87500 2805"
[9] " 10001 01/14/1986 11 GREAT FALLS GAS CO
[20] " 10001 01/29/1986 11 GREAT FALLS GAS CO
-6.06250 4600"
This data goes on for a huge number of rows (not sure exactly how
many).
Each element in each row is separated by and uneven number of (what
seem to
be) spaces (maybe TAB? not sure). Further, there are some rows that
are
"incomplete", i.e. there's missing elements.
Take the first 29 rows of "dataset.txt" into a separate data file,
let's
call it "dataset2.txt". read.table("dataset2.txt",skip=5) gives the
perfect
table that I want to end up with, except I want it with the 4GB data
through
bigmemory, ff or filehash.
snipped several failed attempts
NA NA NA NA NA NA NA NA NA NA NA NA NA
#Even worse.
###/*MY ATTEMPT USING sqldf*/###
No idea what to do here.
-----
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.