Hi all

I have been trying to use the randomForest package to model insect species 
abundance in different habitats and identify the key variables 
(landscape/climate etc) in determining abundance, which has all worked fine and 
I get nice variable importance plots etc. Many thanks to everyone on this help 
forum who has given tips/advice along the way.

But the percentage variance explained /pseudo r squared reported when I call 
print(model) is quite low, depending on the species being modelled it ranges 
from a maximum of 23.69 right down to -2.08.

I believe that the minus value represents a model that performs no better / 
worse than random and obviously the larger the R^2 gets the better the 
predictive ability but over what range does this r^2 operate?

As it is not unexpected that some of these models would have poor predictive 
accuracy as part of the larger project around this work is to say finer 
resolution remotely sensed satellite imagery is needed to derive the climate 
variables etc being used to predict species abundance.

My question is probably a bit like how long is a piece of string but if anyone 
could offer some guidance on what constitutes a good / very good / bad / very 
bad r-squared value for random forest it would be most appreciated and if there 
are any other accuracy measure that can be used with Random Forest in addition 
to the pseudo r^2 value? as this work will be presented to an 
entomology/ecology audience where machine learning is a bit outside their (and 
my) statistics comfort zone.

Many thanks in advance

Lara

lara.har...@bbsrc.ac.uk

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to