I have modified my code since asking my original question. The classifier is now generated correctly (with a good, low error rate, as expected). However, I am running into two issues:
1) I am getting an error at the prediction stage, I get only NA's when I try to run data down the forest; 2) I run out of memory when generating the forest with more than 200 trees due to the large block of memory already occupied by the training data Here is my code: library(raster) library(randomForest) # Set some user variables fn = "image.pix" outraster = "output.pix" training_band = 2 validation_band = 1 # Get the training data myraster = stack(fn) training_class = subset(myraster, training_band) training_class[training_class == 0] = NA training_class = Which(training_class != 0, cells=TRUE) training_data = extract(myraster, training_class) training_response = as.factor(as.vector(training_data[,training_band])) training_predictors = training_data[,3:nlayers(myraster)] remove(training_data) # Create and save the forest r_tree = randomForest(training_predictors, y=training_response, ntree = 200, keep.forest=TRUE) # Runs out of memory with ntree > ~200 remove(training_predictors, training_response) # Classify the whole image predictor_data = subset(myraster, 3:nlayers(myraster)) layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)] predictions = predict(predictor_data, r_tree, filename=outraster, format="PCIDSK", overwrite=TRUE, progress="text", type="response") #All NA!? remove(predictor_data) See also a thread I started on http://stackoverflow.com/questions/4186507/rgdal-efficiently-reading-lar ge-multiband-rasters about improving the efficiency of collecting the training data... Thanks, Benjamin -----Original Message----- From: Liaw, Andy [mailto:andy_l...@merck.com] Sent: November 11, 2010 7:02 AM To: Deschamps, Benjamin; r-help@r-project.org Subject: RE: [R] randomForest parameters for image classification Please show us the code you used to run randomForest, the output, as well as what you get with other algorithms (on the same random subset for comparison). I have yet to see a dataset where randomForest does _far_ worse than other methods. Andy > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Deschamps, Benjamin > Sent: Tuesday, November 09, 2010 10:52 AM > To: r-help@r-project.org > Subject: [R] randomForest parameters for image classification > > I am implementing an image classification algorithm using the > randomForest package. The training data consists of 31000+ training > cases over 26 variables, plus one factor predictor variable (the > training class). The main issue I am encountering is very low overall > classification accuracy (a lot of confusion between classes). > However, I > know from other classifications (including a regular decision tree > classifier) that the training and validation data is sound and capable > of producing good accuracies). > > > > Currently, I am using the default parameters (500 trees, mtry not set > (default), nodesize = 1, replace=TRUE). Does anyone have experience > using this with large datasets? Currently I need to randomly sample my > training data because giving it the full 31000+ cases returns > an out of > memory error; the same thing happens with large numbers of > trees. From > what I read in the documentation, perhaps I do not have > enough trees to > fully capture the training data? > > > > Any suggestions or ideas will be greatly appreciated. > > > > Benjamin > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Notice: This e-mail message, together with any attachme...{{dropped:12}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.