I am implementing an image classification algorithm using the
randomForest package. The training data consists of 31000+ training
cases over 26 variables, plus one factor predictor variable (the
training class). The main issue I am encountering is very low overall
classification accuracy (a lot of confusion between classes). However, I
know from other classifications (including a regular decision tree
classifier) that the training and validation data is sound and capable
of producing good accuracies). 

 

Currently, I am using the default parameters (500 trees, mtry not set
(default), nodesize = 1, replace=TRUE). Does anyone have experience
using this with large datasets? Currently I need to randomly sample my
training data because giving it the full 31000+ cases returns an out of
memory error; the same thing happens with large numbers of trees.  From
what I read in the documentation, perhaps I do not have enough trees to
fully capture the training data?

 

Any suggestions or ideas will be greatly appreciated.

 

Benjamin


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to