Hi everyone. I'm using a random forest in R to successfully perform a classification on a dichotomous DV in a dataset that has 29 IVs of type double and approximately 285,000 records. I ran my model on a 70/30 train/test split of the original dataset.

I'm trying to use the rfUtilities package for rf model selection and performance evaluation, in order to generate a p-value and other quantitative performance statistics for use in hypothesis testing, similar to what I would do with a logistic regression glm model.

The initial random forest model results and OOB error estimates were as follows:

randomForest(formula = Class ~ ., data = train)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 5

        OOB estimate of  error rate: 0.04%
Confusion matrix:
       0   1  class.error
0 199004  16 8.039393e-05
1     73 271 2.122093e-01


I'm running this model on my laptop (Win10, 8 GB RAM) as I don't have access to my server during the pandemic. The rfUtilities function call works (or at least it doesn't give me an error message or crash), but it's been running for over a day in RStudio on the original rf model and the training dataset without providing any results.

For anyone who has used the rfUtilities package before, is this just too large of a dataframe for a Win10 laptop to process effectively or should I be doing something different? This is my first time using the rfUtilities package and I understand that it is relatively new.

The function call for the rfUtilities function rf.significance is as follows (rf is my original random forest data model from the randomForest function):

rf.perm <- rf.significance(rf, train[,1:29], nperm=99, ntree=500)


Thanks in advance.

Tom Woolman
PhD student, Indiana State University

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to