Hi all I have been trying to use the randomForest package to model insect species abundance in different habitats and identify the key variables (landscape/climate etc) in determining abundance, which has all worked fine and I get nice variable importance plots etc. Many thanks to everyone on this help forum who has given tips/advice along the way.
But the percentage variance explained /pseudo r squared reported when I call print(model) is quite low, depending on the species being modelled it ranges from a maximum of 23.69 right down to -2.08. I believe that the minus value represents a model that performs no better / worse than random and obviously the larger the R^2 gets the better the predictive ability but over what range does this r^2 operate? As it is not unexpected that some of these models would have poor predictive accuracy as part of the larger project around this work is to say finer resolution remotely sensed satellite imagery is needed to derive the climate variables etc being used to predict species abundance. My question is probably a bit like how long is a piece of string but if anyone could offer some guidance on what constitutes a good / very good / bad / very bad r-squared value for random forest it would be most appreciated and if there are any other accuracy measure that can be used with Random Forest in addition to the pseudo r^2 value? as this work will be presented to an entomology/ecology audience where machine learning is a bit outside their (and my) statistics comfort zone. Many thanks in advance Lara lara.har...@bbsrc.ac.uk [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.