Hai all, We are working on Multi-class Classification. Currently up to 1.1 million records Ranger package in R is able to handle. Training time on 128 GB RAM is 12 days, which is not a practically feasible method to proceed further.
In future we will have dataset of dimension 10 million records, we are in search for a package or framework which can handle 10 million records with at least 12000 features. The package or framework we are searching should handle all the below tasks: 1. Pre-processing of words in corpus( Stopword removal, stemming, remove special character) 2. Construct document term matrix 3. Feature selection process like chi square, information gain, Gain ration. 4. Random forest classification etc Kindly let us know the package or framework which can scale up to 10 million rows and 12 columns. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.