Re: [R] Help-Multi class classification for large datasets

Bert Gunter Tue, 18 Jul 2017 07:43:37 -0700

You may get some help here, but you should also do your own homework
by looking at the CRAN Machine Learning Task view here:


https://cran.r-project.org/web/views/MachineLearning.html


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jul 18, 2017 at 12:37 AM, Ranjana Girish
<ranjanagiris...@gmail.com> wrote:
> Hai all,
>
> We are working on Multi-class Classification. Currently up to 1.1 million
> records Ranger package in R is able to handle. Training time on 128 GB RAM
> is 12 days, which is not a practically feasible method to proceed further.
>
> In future we will have dataset of dimension 10 million records, we are in
> search for a package or framework which can handle 10 million records with
> at least 12000 features.
>
>
> The package or framework we are searching should handle all the below tasks:
>
> 1. Pre-processing of words in corpus( Stopword removal, stemming, remove
> special character)
> 2. Construct document term matrix
> 3. Feature selection process like chi square, information gain, Gain ration.
> 4. Random forest classification etc
>
> Kindly let us know the package or framework which can scale up to 10
> million rows and 12 columns.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help-Multi class classification for large datasets

Reply via email to