I am implementing an image classification algorithm using the randomForest package. The training data consists of 31000+ training cases over 26 variables, plus one factor predictor variable (the training class). The main issue I am encountering is very low overall classification accuracy (a lot of confusion between classes). However, I know from other classifications (including a regular decision tree classifier) that the training and validation data is sound and capable of producing good accuracies).
Currently, I am using the default parameters (500 trees, mtry not set (default), nodesize = 1, replace=TRUE). Does anyone have experience using this with large datasets? Currently I need to randomly sample my training data because giving it the full 31000+ cases returns an out of memory error; the same thing happens with large numbers of trees. From what I read in the documentation, perhaps I do not have enough trees to fully capture the training data? Any suggestions or ideas will be greatly appreciated. Benjamin [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.