You may get some help here, but you should also do your own homework by looking at the CRAN Machine Learning Task view here:
https://cran.r-project.org/web/views/MachineLearning.html Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jul 18, 2017 at 12:37 AM, Ranjana Girish <ranjanagiris...@gmail.com> wrote: > Hai all, > > We are working on Multi-class Classification. Currently up to 1.1 million > records Ranger package in R is able to handle. Training time on 128 GB RAM > is 12 days, which is not a practically feasible method to proceed further. > > In future we will have dataset of dimension 10 million records, we are in > search for a package or framework which can handle 10 million records with > at least 12000 features. > > > The package or framework we are searching should handle all the below tasks: > > 1. Pre-processing of words in corpus( Stopword removal, stemming, remove > special character) > 2. Construct document term matrix > 3. Feature selection process like chi square, information gain, Gain ration. > 4. Random forest classification etc > > Kindly let us know the package or framework which can scale up to 10 > million rows and 12 columns. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.