On Feb 11, 2011, at 7:51 AM, John Filben wrote:
I have recently been using R - more speciifcally the GUI packages
Rattle
and Rcmdr.
I like these products a lot and want to use them for some projects -
the problem
that I run into is when I start to try and run large datasets
through them. The
data sets are 10-15 million in record quantity and usually have
15-30 fields
(both numerical and categorical).
You could instead just buy memory. 32GB ought to be sufficient for
descriptives and regression. Might even get away with 24.
I saw that there were some packages that could deal with large
datasets in R -
bigmemory, ff, ffdf, biganalytics. My problem is that I am not much
of a coder
(and the reason I use the above mentioned GUIs). These GUIs do show
the executable R code in the background - my thought was to run a
small sample
through the GUI, copy the code, and then incorporate some of the
large data
packages mentioned above - have anyone every tried to do this and
would you have
working examples. In terms of what I am trying to do to the data -
really
simple stuff - desriptive statistics,
Should be fine here.
k-means clustering, and possibly some decision trees.
Not sure how well those scale to tasks as large as what you propose,
especially since you don't mention packages or functions. Not sure
they don't, either.
--
David.
Any help would be greatly appreciated.
Thank you - John
John Filben
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.