This is explained in the "Details" section of the help page for partialPlot.
Best
Andy
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jesús Para
> Fernández
> Sent: Tuesday, April 12, 2016 1:17 AM
> To: r-help@r-project.org
> Subject: [R] Random For
Perhaps if you follow the posting guide more closely, you might get more
(useful) replies, but without looking at your data, I doubt there's much
anyone can do for you.
The fact that the range of the outlying measures is -1 to 2 would tell
me there are no potential outliers by this measure. Pleas
randomForest predictions are based on votes of individual trees, thus
have little to do with error rates of individual trees.
Andy
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Miklos Kiss
> Sent: Saturday, July 19, 2008 10:47 PM
> To: r-help@r-
Not a direct answer to your questions, but for error-in-variables
problems, there are newer technologies than what is in NR. For example:
install.packages("simex")
library(simex)
example(simex)
Andy
From: Marc Fischer
>
> Dear Folks,
>
> We need to fit the model y~x assuming there are rando
I couldn't resist, either...
> From: Henrik Bengtsson
>
> Hmm,
>
> couldn't resists:
>
> > X <- NA
> > is.logical(X)
> [1] TRUE
> > (X == TRUE)
> [1] NA
>
> > "==.MaybeNA" <- function(e1, e2) { !is.na(e1) && (e1 == e2) }
> > X <- structure(NA, class="MaybeNA")
> > is.logical(X)
> [1] TRUE
> >
You don't need to test that the _sample_ variances are different: They
already are. Statistical tests of hypotheses are not about sample
statistics, but distributional charateristics.
It seems to me that reinforcement of some basic stat concept may do you
quite a bit of good. If you don't have
From: Rolf Turner
>
> On 8/09/2009, at 9:07 PM, FMH wrote:
>
> > Dear All,
> >
> > I'm looking for a way on computing the derivative of first and
> > second order of a smoothing curve produced by a nonprametric
> > regression. For instance, if we run the R script below, a smooth
> > nonpara
From: Dror
>
> Hi,
> i'm using randomForest package and i have 2 questions:
> 1. Can i drop one tree from an RF object?
Yes.
> 2. i have a 300 trees forest, but when i use the predict
> function on new
> data (with predict.all=TRUE) i get only 270 votes. did i do
> something wrong?
Try to fol
You can try the locfit package, which I believe can handle up to 5
variables. E.g.,
R> library(locfit)
Loading required package: akima
Loading required package: lattice
locfit 1.5-6 2010-01-20
R> x <- matrix(runif(1000 * 3), 1000, 3)
R> y <- rnorm(1000)
R> mydata <- data.frame(x, y)
R> str(m
From: Dror
>
> Hi,
> I need help with the randomForest prediction. i run the folowing code:
>
> > iris.rf <- randomForest(Species ~ ., data=iris,
> > importance=TRUE,keep.forest=TRUE, proximity=TRUE)
> > pr<-predict(iris.rf,iris,predict.all=T)
> > iris.rf$votes[53,]
> setosa versicolor virg
From: Dror
>
> Hi,
> I'm working with randomForest package and i have 2 questions:
> 1. how can i drop a specific tree from the forest?
Answered in another post.
> 2. i'm trying to get the voting of each tree in a prediction
> datum using the
> folowing code
> > pr<-predict(RF,NewData,type="
From: David Winsemius
>
> On Mar 1, 2010, at 12:07 PM, Nicholas Lewin-Koh wrote:
>
> > Hi,
> > consider the following
> >> a<-gl(3,3,9)
> >> a
> > [1] 1 1 1 2 2 2 3 3 3
> > Levels: 1 2 3
> >> levels(a)<-3:1
>
> That may look like the same re-ordered factor but you instead merely
> re-labeled e
If memory serves, Bill Venables said in the paper cited several times
here, that there's only one type of sums of squares. So there's only
one type of "ANOVA" (if I understand what you mean by ANOVA).
Just forget about the different types of tests, and simply ask yourself
this (hopefully simple a
In most implementations of boosting, and for that matter, single tree,
the first variable wins when there are ties. In randomForest the
variables are sampled, and thus not tested in the same order from one
node to the next, thus the variables are more likely to "share the
glory".
Best,
Andy
Fro
That sounds like a particular form of permutation test. If the
"scrambling" is replaced by sampling with replacement (i.e., some data
points can be sampled more than once while others can be left out),
that's the simple (or nonparametric) bootstrap. The goal is to generate
the distribution of the
If your ultimate interest is in real scientific progress, I'd suggest that you
ignore that sentence (and any conclusion drawn subsequent to it).
Cheers,
Andy
From: bbslover
>
> This topic refer to independent variables reduction, as we
> know ,a lot of
> method can do with it,however, for pre
One way to do it (no p-values) is explained in the original CART book.
You basically add up all the "improvement" (in fit$split[, "improve"])
due to each splitting variable.
Andy
From: Tal Galili
>
> Simple example:
>
> # Classification Tree with rpart
>
> library(rpart)
>
> # grow tree
>
>
Thanks for providing the code that allows me to reproduce the problem.
It looks like the prediction routine for some reason returns "0" as
prediction for some trees, thus causing the problem observed. I'll look
into it.
Andy
From: Dror
>
> Hi,
> Thank you for your replies
> as for the predic
I believe Pinhiero et al published a paper in JCGS a few years back on
the subject, modeling the random effects with t distributions. No
software were publicly available, as far as I know.
Andy
From: S Ellison
> Sent: Thursday, March 11, 2010 9:56 AM
> To: r-help@r-project.org
> Subject: [R] Ro
Seems like you're new to R as well? The first argument should contain
only the predictor variables, but you used the entire data frame that
contains the response.
Andy
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Corinne
Something like this?
R> mean(rnorm(100))
[1] -0.0095774
R> .Last.value
[1] -0.0095774
Andy
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Stephane
> Sent: Tuesday, June 30, 2009 2:07 PM
> To: r-help@r-project.org
> Subject
For the coefficient to be equal to the correlation, you need to scale y as well.
You can get the correlations by something like the following and then
back-calculate the coefficients from there.
R> x = matrix(rnorm(100*4e4), 100, 4e4)
R> y = rnorm(100)
R> rxy = cor(x, cbind(y))
Andy
> -Or
Generally speaking, the pseudo R^2 of 70% is a rather good model
(obviously depends on the kind of data you have at hand). Because it's
"pseudo", not "real", R^2, so the range is not limited to [0, 100%], but
it's hard for me to imagine anyone getting >100%.
You may want to check the distribution
Just to follow up on Bert's and Frank's excellent comments. I'm
continued to be amazed by people trying to interpret a single tree.
Besides the variability in the tree structure (try bootstrapping and see
how the trees change), it is difficult to make sense of splits more than
a few levels down (h
From: Larry D'Agostino
>
> So I've been working with Random Forests ( R library is
> randomForest) and I
> curious if Random Forests could be applied to classifying on
> a real time
> basis. For instance lets say I've scored fraud from a group of
> transactions. If I want to score any new inco
Store the returned value of partialPlot() in an object and do your own
barplot. Read the "Value" section in the help page for partialPlot.
Andy
From: Daudi Jjingo
>
> Hello,
>
> I need to draw a partial dependence bar graph.
> My the my predictor vectors are continous and so is the
> respons
I would have thought that the help page for importance() is an (the?) obvious
place to look...
If that description is not clear, please let me know which part isn't clear to
you.
Andy
From: Mareike Lies
>
> I am trying to use the package RandomForest performing regression.
> The variable impo
You may want to run
RSiteSearch("monotone splines")
at the R prompt. The 3rd hit looks quite promising. However, if I
understand your data, you have multiple y values for the same x
values. If so, can you justify inverting the regression function?
The traffic on this mailing list is very hi
No. The forest object is too large as is. I didn't think it's worth
the extra memory to store them. They were never kept even in the
Fortran/C code.
Andy
From: Chrysanthi A.
> Sent: Monday, September 28, 2009 5:20 PM
> To: r-help@r-project.org
> Subject: [R] how to visualize gini coefficient
From: Chrysanthi A. [mailto:chrys...@gmail.com]
Sent: Tuesday, September 29, 2009 4:55 PM
To: Liaw, Andy
Cc: r-help@r-project.org
Subject: Re: [R] how to visualize gini coefficient in each node
in RF?
Thanks for the reply! However, what is the code
Are you talking about the y-axis or the x-axis? If you're talking about
the y-axis, that range isn't really very meaningful. The partial
dependence function basically gives you the "average" trend of that
variable (integrating out all others in the model). It's the shape of
that trend that is "i
That has not yet been implemented in the R version of the package.
Best,
Andy
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Chrysanthi A.
> Sent: Thursday, October 22, 2009 6:40 AM
> To: r-help@r-project.org
> Subject: [R]
sounds like bivariate density contours may be what you're looking for.
Andy
From: Eric Nord
>
> I'm attempting to produce something like a violin plot to
> display how y
> changes with x for members of different groups (My specific
> case is how
> floral area changes over time for several spe
Those are the same warnings I get when I test the package (before submitting to
CRAN) and have been that way for a long time. They stemmed from conditional
allocation of arrays in C. gcc -wall seems to always pick on that. As far as
I know, they are harmless.
Andy
> -Original Message--
Is that the entire tree? If so there's a problem. The node status is
defined as follows in rf.h of the source code:
#define NODE_TERMINAL -1
#define NODE_TOSPLIT -2
#define NODE_INTERIOR -3
i.e., "-3" means "non-terminal" node.
Andy
> -Original Message-
> From: r-help-boun...@r-pro
I believe the prediction is done some some sort of grid, then
interpolated to fill in the rest. This is, however, purely for
computational reason, and not for any threoretical reasons. The formal
definition of local polynomials is to do a weighted fit of polynomial at
each point.
Andy
> -O
You need to be _extremely_ careful when assigning levels of factors. Look at
this example:
R> x1 = factor(c("a", "b", "c"))
R> x2 = factor(c("a", "c", "c"))
R> x3 = x2
R> levels(x3) <- levels(x1)
R> x3
[1] a b b
Levels: a b c
I'll try to add more proofing in the code...
Andy
> -Origi
Try re-starting R, load the randomForest package, and then run
example(rfImpute) and see if that works. Can you post your
sessionInfo() output?
Andy
From: cosmos science
>
> Dear Administrator,
>
> I am using linux (suse 10.2). While attempting rfImpute, I am
> getting the
> following error
If the matching need not be one-to-one, then you can just compute the
Euclidean distances between the two vectors, then in each row (or
column, which ever corresponds to the shorter vector) and find the
smallest. This should be fairly easy to do.
Andy
From: Thomas S. Dye
>
> Given two numeric
From: Uwe Ligges
>
> Yuanyuan wrote:
> > Greetings,
> >
> > I am using rpart for classification with "class" method.
> The test data is
> > the Indian diabetes data from package mlbench.
> >
> > I fitted a classification tree firstly using the original
> data, and then
> > exchanged the order
A few points to consider:
- If all the data are numeric, then use matrices instead of data frames.
- With either data frames or matrices, there is no way (that I'm aware
of anyway) in R to stack them without making at least one copy in
memory.
- Since none of the files has a header row, I would
Here's one possibility:
idx <- sample(nrow(acc))
training <- acc[idx[1:400], ]
testset <- acc[-idx[1:400], ]
Andy
From: Chris Arthur
>
> Forgive the newbie question, I want to select random rows from my
> data.frame to create a test set (which I can do) but then I want to
> create a training
Check out the help page for replicate().
Andy
From: barbara.r...@uniroma1.it
>
> I must to create an array with dimensions 120x8x500. Better I
> have to make 500 simulations of 8 series of return from a multivariate
> normal distribution. there's the command "mvrnorm" but how I
> can do this
There's also the "nnls" (non-negative least squares) package on CRAN
that might be useful, although I'm puzzled by the negative sign in front
of c in Alex post...
Cheers,
Andy
From: Berwin A Turlach
>
> G'day Alex,
>
> On Wed, 27 May 2009 11:51:39 +0200
> Alex van der Spek wrote:
>
> > I won
Couldn't you get that just by giving heatmap() the transpose of your
data?
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roy
> Sent: Monday, June 08, 2009 9:32 AM
> To: r-help@r-project.org
> Subject: [R] Heatmap
>
>
It actually means that the MSE (0.04605) is 130.42% of var(y), thus the
model had not provided any better explanatory power than predicting by
mean(y). The pseudo R^2 is just 100% - 130.42% = -30.42%. Remember
that this is not the resubstituttion estimate because it is computed
from the OOB estim
Could it be that the "problematic" data came from csv files with quotes?
What does str() on those data say? Recall that apply() will coerce the
object to a matrix (if it's not), which means everything needs to be the
same type, so if even just one column is read into R as non-numeric, the
entire r
A colleague and I were trying to understand all the possible things one
can do with for loops in R, and found some surprises. I think we've
done sufficient detective work to have a good guess as to what's going
on "underneath", but it would be nice to get some confirmation, and
better yet, perhaps
From: Duncan Murdoch
>
> Liaw, Andy wrote:
> > A colleague and I were trying to understand all the
> possible things one
> > can do with for loops in R, and found some surprises. I think we've
> > done sufficient detective work to have a good guess as to
> wh
Didn't realize the message was cc'ed to R-help. Here's my reply...
____
From: Liaw, Andy
Sent: Thursday, June 18, 2009 11:35 AM
To: 'Li GUO'
Subject: RE: Can I estimate strength and correlation of Random Forest in
R package " ran
The short answer is "no" (meaning to leave the blocks in the model). As
Frank Harrell said, you've spent your degrees of freedom. Go home and
be happy.
Best,
Andy
From: J S
> Sent: Tuesday, March 24, 2009 9:49 AM
> To: r-help@r-project.org
> Subject: [R] Do we have to control for block in bloc
Read ?importance, especially the "scale" argument.
Andy
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Li GUO
> Sent: Friday, March 27, 2009 1:24 PM
> To: r-help@r-project.org
> Subject: [R] Random Forest Variable Importance
It's not nodesize in the formula, but var(y) (with divisor n, not n-1).
It's sort of like the adjusted R-squared (because it uses mean squares
instead of sum of squares), but uses the OOB estimate of MSE. If
there's very little or no explanatory power in the predictor variables,
this statistic wou
The source code of the whole package is available on CRAN. All packages
are submitted to CRAN is source form.
There's no "rule" per se that gives the final prediction, as the final
prediction is the result of plural vote by all trees in the forest.
You may want to look at the varUsed() and getTr
, just the underlying representation of the tree).
Andy
From: Chrysanthi A. [mailto:chrys...@gmail.com]
Sent: Wednesday, April 08, 2009 2:56 PM
To: Liaw, Andy
Cc: r-help@r-project.org
Subject: Re: [R] help with random fore
I'll take a shot.
Let me try to explain the 3rd measure first. A RF model tries to predict an
outcome variable (the classes) from a group of potential predictor variables
(the "x"). If a predictor variable is "important" in making the prediction
accurate, then by messing with it (e.g., giving
The R News article we put out after the first version of the package was
released has examples of doing CV. You can also use the facilities in the
caret package (on CRAN) or the MLInterface package (part of Bioconductor, not
on CRAN).
randomForest() itself does not do CV per se, but the OOB es
riday, April 10, 2009 10:44 AM
To: Liaw, Andy
Cc: r-help@r-project.org
Subject: Re: [R] help with random forest package
Hi,
To be honest, I cannot really understand what is the meaning of
the votes.. For example having five sa
ypo).
Cheers,
Andy
From: Chrysanthi A. [mailto:chrys...@gmail.com]
Sent: Monday, April 13, 2009 9:44 AM
To: Liaw, Andy
Cc: r-help@r-project.org
Subject: Re: [R] help with random forest package
But how does it estimat
MSE is the mean squared residuals. For the training data, the OOB
estimate is used (i.e., residual = data - OOB prediction, MSE =
sum(residuals) / n, OOB prediction is the mean of predictions from all
trees for which the case is OOB). It is _not_ the average OOB MSE of
trees in the forest.
I hop
Apologies: that should have been sum(residual^2)!
> -Original Message-
> From: Dimitri Liakhovitski [mailto:ld7...@gmail.com]
> Sent: Monday, April 13, 2009 4:35 PM
> To: Liaw, Andy
> Cc: R-Help List
> Subject: Re: [R] Random Forests: Question about R^2
>
> And
gt; due to error, then R^2 = 1 - MSE/var(y).
>
> If it's correct, my last question would be:
> I am getting as many R^2 as the number of trees because each time the
> residuals are recalculated using all trees built so far, correct?
>
> Thank you very much!
> D
Yes, you've got it!
Cheers,
Andy
From: Behalf Of Dimitri
>
> Hello!
>
> I think I am relatively clear on how predictor importance (the first
> one) is calculated by Random Forests for a Classification tree:
>
> Importance of predictor P1 when the response variable is categorical:
>
> 1. For
look at the output of randomForest(..., inbag=TRUE) to see
which data point is OOB for which tree.
I hope that's clear now.
Cheers,
Andy
________
From: Chrysanthi A. [mailto:chrys...@gmail.com]
Sent: Tuesday, April 28, 2009 8:52 AM
To: Liaw, Andy
Cc: r-help
This message landed in the "Junk e-mails" folder (of which I have no
control), and it just so happens today that I glanced in the folder
today, instead of just emptying it without checking, trusting the filter
to do the Right Thing...
Since you seem to run into preblem with predict.randomForest, o
In short, yes.
Andy
From: Haring, Tim (LWF)
>
> Hi at all,
>
> maybe this question is quite simple for a statistician, but
> for me it is not. After reading a lot of mail in the R-help
> archive I`m still not quite sure I get it.
> When applying a randomForest to a new dataset with
> predi
svm() in the e1071 package is an interface to the libsvm code. Look at the
link provided in the help page for that function. You will have to read up how
density estimation is achieved via one-class SVM.
Andy
From: excalibur
>
> In the R-help of the svm function of the package e1071 it's
>
I believe something like:
scatter.smooth(est.prob, as.numeric(y == "level of interest"))
would be close. You may want to use a larger span than default.
Andy
From: abbas tavassoli
>
> Hi,
> I have a binary variable and corresponding predicted
> probability (using
> logistic regression on
From: ivo welch
>
> dear R experts---I am using the coef() function to pick off
> the coefficients
> from an lm() object. alas, I also need the standard errors
> and I need them
> fast. I know I can do a "summary()" on the object and pick
> them off this
> way, but this computes other stuff I
From: bbslover
>
> now I am learining random forest and using random forest
> package, I can get
> the OOB error rates, and test set rate, now I want to get the
> training set
> error rate, how can I do?
>
> pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest
> =FALSE,do.trace=1e2)
Please try to follow the posting guide and give a reproducible example,
as below:
R> library(randomForest)
randomForest 4.5-34
Type rfNews() to see new features/changes/bug fixes.
R> iris2 = iris[-5]
R> iris.rf = randomForest(Petal.Width~., iris2, maxnodes=4, ntree=50)
R> nodesize(iris.rf)
Error:
Just replacing preplot() with predict() should be fine.
BTW, it's always a good idea to specify the version of the package
you're using as well.
Best,
Andy
From: mh...@berkeley.edu
>
> Hi,
>
> I'm trying to work through the examples and code in Loader's
> LOCAL REGRESSION AND LIKELIHOOD, and
From: Peng Yu
>
> On Wed, Feb 3, 2010 at 2:12 AM, Emmanuel Charpentier
> wrote:
> > Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit :
> >> On Feb 2, 2010, at 11:38 PM, Peng Yu wrote:
> >>
> >> > ?contrast in the contrast package gives me the following
> description.
> >> > How
From: Peng Yu
>
> On Wed, Feb 3, 2010 at 10:01 AM, Peng Yu wrote:
> > Some examples in the help page are too long to be copied
> from screen.
> > Could somebody let me know some easy way on how to extract
> the example
> > to a file so that I can play with them?
>
> I forget to mention. I use
It needs to be done "by hand", in that partialPlot() does not handle more than
one variable at a time. You need to modify its code to do that (and be ready
to wait even longer, as it can be slow).
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-proj
1. Not sure what you want. What "details" are you looking for exactly? If
you call predict(trainset) without the newdata argument, you will get the
(out-of-bag) prediction of the training set, which is exactly the "predicted"
component of the RF object.
2. If you set type="votes" and norm.v
How about taking the combination of the two? E.g., gamma = factor(paste(alpha,
beta1, sep=":")) and use gamma as the response.
Best,
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Gyanendra Pokharel
Sent: Tuesday, October
Hi Sonja,
How did you build the rpart tree (i.e., what settings did you use in
rpart.control)? Rpart by default will use cross validation to prune back the
tree, whereas RF doesn't need that. There are other more subtle differences as
well. If you want to compare single tree results, you rea
Not unless we have more information. Please read the Posting Guide to see how
to make it easier for people to answer your question.
Best,
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Oritteropus
Sent: Thursday, November 2
Try the following:
set.seed(100)
rf1 <- randomForest(Species ~ ., data=iris)
set.seed(100)
rf2 <- randomForest(iris[1:4], iris$Species)
object.size(rf1)
object.size(rf2)
str(rf1)
str(rf2)
You can try it on your own data. That should give you some hints about why the
formula interface should be
Without data to reproduce what you saw, we can only guess.
One possibility is due to tie-breaking. There are several places where ties
can occur and are broken at random, including at the prediction step. One
difference between the two ways of doing prediction is that when it's all done
withi
If you are using the code, that's not really using randomForest directly. I
don't understand the data structure you have (since you did not show anything)
so can't really tell you much. In any case, that warning came from
randomForest() when it is run in regression mode but the response has fe
Use KernSmooth (one of the recommended packages that are included in R
distribution). E.g.,
> library(KernSmooth)
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009
> x <- seq(0, 1, length=201)
> y <- 4 * cos(2*pi*x) + rnorm(x)
> f <- locpoly(x, y, degree=0, kernel="epan", bandwidth=.1)
> plo
The difference is importance(..., scale=TRUE). See the help page for detail.
If you extract the $importance component from a randomForest object, you do not
get the scaling.
Best,
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Beha
Classification trees use the Gini index, whereas the regression trees use sum
of squared errors. They are "hard-wired" into the C/Fortran code, so not
easily changeable.
Best,
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of
#2 can be done simply with predict(fmi, type="prob"). See the help page for
predict.randomForest().
Best,
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of arun
Sent: Tuesday, November 26, 2013 6:57 PM
To: R help
Subject: Re:
Yes, that's part of the intention anyway. One can also use them to do
clustering.
Best,
Andy
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Massimo Bressan
Sent: Monday, December 02, 2013 6:34 AM
To: r-help@r-project.org
Subject
You can try something like this:
http://pubs.acs.org/doi/abs/10.1021/ci050022a
Basically similar idea to what is done in random forests: permute predictor
variable one at a time and see how much that degrades prediction performance.
Cheers,
Andy
-Original Message-
From: r-help-boun...@r
You have not shown any code on exactly how you use na.roughfix(), so I
can only guess.
If you are doing something like:
randomForest(y ~ ., mybigdata, na.action=na.roughfix, ...)
I would not be surprised that it's taking very long on large datasets.
Most likely it's caused by the formula inter
roughfix(x))
user system elapsed
8.440.398.85
R 2.11.1, randomForest 4.5-35, Windows XP (32-bit), Thinkpad T61 with
2GB ram.
Andy
From: Mike Williamson [mailto:this.is@gmail.com]
Sent: Thursday, July 01, 2010 12:48 PM
To: Liaw, Andy
Cc: r-h
I'll incorporate some of these ideas into the next release. Thanks!
Best,
Andy
-Original Message-
From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley
Wickham
Sent: Thursday, July 01, 2010 8:08 PM
To: Mike Williamson
Cc: Liaw, Andy; r-help
Subject: Re: [R] a
There's a bug in the code. If you add row names to the X matrix befor
you call randomForest(), you'd get:
R> summary (outlier(mdl.rf) )
Min. 1st Qu. MedianMean 3rd Qu.Max.
-1.0580 -0.5957 0. 0.6406 1.2650 9.5200
I'll fix this in the next release. Thanks for reporting.
Bes
> From: jlu...@ria.buffalo.edu
>
> Clearly inferior treatments are unethical.
The Big Question is: What constitute "clearly"? Who or How to decide
what is "clearly"? I'm sure there are plenty of people who don't
understand much Statistics and are perfectly willing to say the results
on the tw
> From: Vijayan Padmanabhan
>
> Dear R Group
> I had an observation that in some cases, when I use the
> randomForest model
> to create partialPlot in R using the package "randomForest"
> the y-axis displays values that are more than -1!
> It is a classification problem that i was trying to addr
One possibility:
R> f = function(x, f) eval(as.call(list(as.name(f), x)))
R> f(1:10, "mean")
[1] 5.5
R> f(1:10, "max")
[1] 10
Andy
From: Jonathan Greenberg
> R-helpers:
>
> If I want to pass a character name of a function TO a
> function, and then
> have that function executed, how would I do
In a partial dependence plot, only the relative scale, not absolute
scale, of the y-axis is meaningful. I.e., you can compare the range of
the curves between partial dependence plots of two different variables,
but not the actual numbers on the axis. The range is compressed
compared to the origin
The plot titles aren't pretty, but the following works for me:
R> library(randomForest)
randomForest 4.5-37
Type rfNews() to see new features/changes/bug fixes.
R> set.seed(1004)
R> iris.rf <- randomForest(iris[-5], iris[[5]], ntree=1001)
R> par(mfrow=c(2,2))
R> for (i in 1:4) partialPlot(iris.rf,
From: Michael Lindgren
>
> Greetings R Users!
>
> I am posting to inquire about the proximity matrix in the randomForest
> R-package. I am having difficulty pushing very large data through the
> algorithm and it appears to hang on the building of the prox
> matrix. I have
> read on Dr. Breiman
Let me expand on what Max showed.
For the most part, performance on training set is meaningless. (That's
the case for most algorithms, but especially so for RF.) In the default
(and recommended) setting, the trees are grown to the maximum size,
which means that quite likely there's only one data
What Breiman meant is that as the model gets more complex (i.e., as the
number of trees tends to infinity) the geneeralization error (test set
error) does not increase. This does not hold for boosting, for example;
i.e., you can't "boost forever", which nececitate the need to find the
optimal numb
1 - 100 of 241 matches
Mail list logo