[R] random forest significance testing tools

Tom Woolman Sun, 10 May 2020 16:49:17 -0700

Hi everyone. I'm using a random forest in R to successfully perform aclassification on a dichotomous DV in a dataset that has 29 IVs oftype double and approximately 285,000 records. I ran my model on a70/30 train/test split of the original dataset.

I'm trying to use the rfUtilities package for rf model selection andperformance evaluation, in order to generate a p-value and otherquantitative performance statistics for use in hypothesis testing,similar to what I would do with a logistic regression glm model.

The initial random forest model results and OOB error estimates wereas follows:


randomForest(formula = Class ~ ., data = train)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 5

        OOB estimate of  error rate: 0.04%
Confusion matrix:
       0   1  class.error
0 199004  16 8.039393e-05
1     73 271 2.122093e-01

I'm running this model on my laptop (Win10, 8 GB RAM) as I don't haveaccess to my server during the pandemic. The rfUtilities function callworks (or at least it doesn't give me an error message or crash), butit's been running for over a day in RStudio on the original rf modeland the training dataset without providing any results.

For anyone who has used the rfUtilities package before, is this justtoo large of a dataframe for a Win10 laptop to process effectively orshould I be doing something different? This is my first time using therfUtilities package and I understand that it is relatively new.

The function call for the rfUtilities function rf.significance is asfollows (rf is my original random forest data model from therandomForest function):


rf.perm <- rf.significance(rf, train[,1:29], nperm=99, ntree=500)


Thanks in advance.

Tom Woolman
PhD student, Indiana State University

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] random forest significance testing tools

Reply via email to