Hai all,

We are working on Multi-class Classification. Currently up to 1.1 million
records Ranger package in R is able to handle. Training time on 128 GB RAM
is 12 days, which is not a practically feasible method to proceed further.

In future we will have dataset of dimension 10 million records, we are in
search for a package or framework which can handle 10 million records with
at least 12000 features.


The package or framework we are searching should handle all the below tasks:

1. Pre-processing of words in corpus( Stopword removal, stemming, remove
special character)
2. Construct document term matrix
3. Feature selection process like chi square, information gain, Gain ration.
4. Random forest classification etc

Kindly let us know the package or framework which can scale up to 10
million rows and 12 columns.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to