Re: [R] Random Forest classification

2016-04-18 Thread Liaw, Andy
This is explained in the "Details" section of the help page for partialPlot. Best Andy > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jesús Para > Fernández > Sent: Tuesday, April 12, 2016 1:17 AM > To: r-help@r-project.org > Subject: [R] Random For

Re: [R] randomForest outlier

2008-07-16 Thread Liaw, Andy
Perhaps if you follow the posting guide more closely, you might get more (useful) replies, but without looking at your data, I doubt there's much anyone can do for you. The fact that the range of the outlying measures is -1 to 2 would tell me there are no potential outliers by this measure. Pleas

Re: [R] confusion matrix in randomForest

2008-07-21 Thread Liaw, Andy
randomForest predictions are based on votes of individual trees, thus have little to do with error rates of individual trees. Andy > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Miklos Kiss > Sent: Saturday, July 19, 2008 10:47 PM > To: r-help@r-

Re: [R] equivalent R functions for Numerical Recipes fitxy and fitexy ?

2008-07-31 Thread Liaw, Andy
Not a direct answer to your questions, but for error-in-variables problems, there are newer technologies than what is in NR. For example: install.packages("simex") library(simex) example(simex) Andy From: Marc Fischer > > Dear Folks, > > We need to fit the model y~x assuming there are rando

Re: [R] Department of Redundancy Department.

2008-08-15 Thread Liaw, Andy
I couldn't resist, either... > From: Henrik Bengtsson > > Hmm, > > couldn't resists: > > > X <- NA > > is.logical(X) > [1] TRUE > > (X == TRUE) > [1] NA > > > "==.MaybeNA" <- function(e1, e2) { !is.na(e1) && (e1 == e2) } > > X <- structure(NA, class="MaybeNA") > > is.logical(X) > [1] TRUE > >

Re: [R] Test of Homogeneity of Variances

2008-08-22 Thread Liaw, Andy
You don't need to test that the _sample_ variances are different: They already are. Statistical tests of hypotheses are not about sample statistics, but distributional charateristics. It seems to me that reinforcement of some basic stat concept may do you quite a bit of good. If you don't have

Re: [R] Derivative of nonparametric curve

2009-09-09 Thread Liaw, Andy
From: Rolf Turner > > On 8/09/2009, at 9:07 PM, FMH wrote: > > > Dear All, > > > > I'm looking for a way on computing the derivative of first and > > second order of a smoothing curve produced by a nonprametric > > regression. For instance, if we run the R script below, a smooth > > nonpara

Re: [R] Random Forest

2010-02-16 Thread Liaw, Andy
From: Dror > > Hi, > i'm using randomForest package and i have 2 questions: > 1. Can i drop one tree from an RF object? Yes. > 2. i have a 300 trees forest, but when i use the predict > function on new > data (with predict.all=TRUE) i get only 270 votes. did i do > something wrong? Try to fol

Re: [R] Alternatives to linear regression with multiple variables

2010-02-22 Thread Liaw, Andy
You can try the locfit package, which I believe can handle up to 5 variables. E.g., R> library(locfit) Loading required package: akima Loading required package: lattice locfit 1.5-6 2010-01-20 R> x <- matrix(runif(1000 * 3), 1000, 3) R> y <- rnorm(1000) R> mydata <- data.frame(x, y) R> str(m

Re: [R] Random Forest prediction questions

2010-03-01 Thread Liaw, Andy
From: Dror > > Hi, > I need help with the randomForest prediction. i run the folowing code: > > > iris.rf <- randomForest(Species ~ ., data=iris, > > importance=TRUE,keep.forest=TRUE, proximity=TRUE) > > pr<-predict(iris.rf,iris,predict.all=T) > > iris.rf$votes[53,] > setosa versicolor virg

Re: [R] Random Forest

2010-03-01 Thread Liaw, Andy
From: Dror > > Hi, > I'm working with randomForest package and i have 2 questions: > 1. how can i drop a specific tree from the forest? Answered in another post. > 2. i'm trying to get the voting of each tree in a prediction > datum using the > folowing code > > pr<-predict(RF,NewData,type="

Re: [R] Thougt I understood factors but??

2010-03-01 Thread Liaw, Andy
From: David Winsemius > > On Mar 1, 2010, at 12:07 PM, Nicholas Lewin-Koh wrote: > > > Hi, > > consider the following > >> a<-gl(3,3,9) > >> a > > [1] 1 1 1 2 2 2 3 3 3 > > Levels: 1 2 3 > >> levels(a)<-3:1 > > That may look like the same re-ordered factor but you instead merely > re-labeled e

Re: [R] ANOVA "Types" and Regression models: the same?

2010-03-02 Thread Liaw, Andy
If memory serves, Bill Venables said in the paper cited several times here, that there's only one type of sums of squares. So there's only one type of "ANOVA" (if I understand what you mean by ANOVA). Just forget about the different types of tests, and simply ask yourself this (hopefully simple a

Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Liaw, Andy
In most implementations of boosting, and for that matter, single tree, the first variable wins when there are ties. In randomForest the variables are sampled, and thus not tested in the same order from one node to the next, thus the variables are more likely to "share the glory". Best, Andy Fro

Re: [R] scientific (statistical) foundation for Y-RANDOMIZATION in regression analysis

2010-03-08 Thread Liaw, Andy
That sounds like a particular form of permutation test. If the "scrambling" is replaced by sampling with replacement (i.e., some data points can be sampled more than once while others can be left out), that's the simple (or nonparametric) bootstrap. The goal is to generate the distribution of the

Re: [R] How can I understand this sentenc e,and express it by means of Mathema tical approach?

2010-03-08 Thread Liaw, Andy
If your ultimate interest is in real scientific progress, I'd suggest that you ignore that sentence (and any conclusion drawn subsequent to it). Cheers, Andy From: bbslover > > This topic refer to independent variables reduction, as we > know ,a lot of > method can do with it,however, for pre

Re: [R] Is there an equivalence of lm's "anova" for an rpart object ?

2010-03-08 Thread Liaw, Andy
One way to do it (no p-values) is explained in the original CART book. You basically add up all the "improvement" (in fit$split[, "improve"]) due to each splitting variable. Andy From: Tal Galili > > Simple example: > > # Classification Tree with rpart > > library(rpart) > > # grow tree > >

Re: [R] Random Forest

2010-03-10 Thread Liaw, Andy
Thanks for providing the code that allows me to reproduce the problem. It looks like the prediction routine for some reason returns "0" as prediction for some trees, thus causing the problem observed. I'll look into it. Andy From: Dror > > Hi, > Thank you for your replies > as for the predic

Re: [R] Robust estimation of variance components for a nested design

2010-03-11 Thread Liaw, Andy
I believe Pinhiero et al published a paper in JCGS a few years back on the subject, modeling the random effects with t distributions. No software were publicly available, as far as I know. Andy From: S Ellison > Sent: Thursday, March 11, 2010 9:56 AM > To: r-help@r-project.org > Subject: [R] Ro

Re: [R] Regarding variable importance in the randomForest package

2010-03-16 Thread Liaw, Andy
Seems like you're new to R as well? The first argument should contain only the predictor variables, but you used the entire data frame that contains the response. Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Corinne

Re: [R] Equivalent to Matlab's "Ans"

2009-06-30 Thread Liaw, Andy
Something like this? R> mean(rnorm(100)) [1] -0.0095774 R> .Last.value [1] -0.0095774 Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Stephane > Sent: Tuesday, June 30, 2009 2:07 PM > To: r-help@r-project.org > Subject

Re: [R] Linear Regression Problem

2009-07-14 Thread Liaw, Andy
For the coefficient to be equal to the correlation, you need to scale y as well. You can get the correlations by something like the following and then back-calculate the coefficients from there. R> x = matrix(rnorm(100*4e4), 100, 4e4) R> y = rnorm(100) R> rxy = cor(x, cbind(y)) Andy > -Or

Re: [R] randomForest - what is a 'good' pseudo r-squared?

2009-07-21 Thread Liaw, Andy
Generally speaking, the pseudo R^2 of 70% is a rather good model (obviously depends on the kind of data you have at hand). Because it's "pseudo", not "real", R^2, so the range is not limited to [0, 100%], but it's hard for me to imagine anyone getting >100%. You may want to check the distribution

Re: [R] sample size > 20K? Was: fitness of regression tree: how tomeasure???

2010-04-05 Thread Liaw, Andy
Just to follow up on Bert's and Frank's excellent comments. I'm continued to be amazed by people trying to interpret a single tree. Besides the variability in the tree structure (try bootstrapping and see how the trees change), it is difficult to make sense of splits more than a few levels down (h

Re: [R] Question on implementing Random Forests scoring

2010-04-09 Thread Liaw, Andy
From: Larry D'Agostino > > So I've been working with Random Forests ( R library is > randomForest) and I > curious if Random Forests could be applied to classifying on > a real time > basis. For instance lets say I've scored fraud from a group of > transactions. If I want to score any new inco

Re: [R] Help with Partial dependence bar graph

2010-04-21 Thread Liaw, Andy
Store the returned value of partialPlot() in an object and do your own barplot. Read the "Value" section in the help page for partialPlot. Andy From: Daudi Jjingo > > Hello, > > I need to draw a partial dependence bar graph. > My the my predictor vectors are continous and so is the > respons

Re: [R] Question on: Random Forest Variable Importance for RegressionProblems

2010-04-28 Thread Liaw, Andy
I would have thought that the help page for importance() is an (the?) obvious place to look... If that description is not clear, please let me know which part isn't clear to you. Andy From: Mareike Lies > > I am trying to use the package RandomForest performing regression. > The variable impo

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Liaw, Andy
You may want to run RSiteSearch("monotone splines") at the R prompt. The 3rd hit looks quite promising. However, if I understand your data, you have multiple y values for the same x values. If so, can you justify inverting the regression function? The traffic on this mailing list is very hi

Re: [R] how to visualize gini coefficient in each node in RF?

2009-09-29 Thread Liaw, Andy
No. The forest object is too large as is. I didn't think it's worth the extra memory to store them. They were never kept even in the Fortran/C code. Andy From: Chrysanthi A. > Sent: Monday, September 28, 2009 5:20 PM > To: r-help@r-project.org > Subject: [R] how to visualize gini coefficient

Re: [R] how to visualize gini coefficient in each node in RF?

2009-09-30 Thread Liaw, Andy
From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Tuesday, September 29, 2009 4:55 PM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] how to visualize gini coefficient in each node in RF? Thanks for the reply! However, what is the code

Re: [R] Random Forest - partial dependence plot

2009-10-20 Thread Liaw, Andy
Are you talking about the y-axis or the x-axis? If you're talking about the y-axis, that range isn't really very meaningful. The partial dependence function basically gives you the "average" trend of that variable (integrating out all others in the model). It's the shape of that trend that is "i

Re: [R] "interactions" feature in RF?

2009-10-22 Thread Liaw, Andy
That has not yet been implemented in the R version of the package. Best, Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Chrysanthi A. > Sent: Thursday, October 22, 2009 6:40 AM > To: r-help@r-project.org > Subject: [R]

Re: [R] violin - like plots for bivariate data

2009-11-16 Thread Liaw, Andy
sounds like bivariate density contours may be what you're looking for. Andy From: Eric Nord > > I'm attempting to produce something like a violin plot to > display how y > changes with x for members of different groups (My specific > case is how > floral area changes over time for several spe

Re: [R] Installing RandomForest on SuSe Linux - warnings

2009-12-07 Thread Liaw, Andy
Those are the same warnings I get when I test the package (before submitting to CRAN) and have been that way for a long time. They stemmed from conditional allocation of arrays in C. gcc -wall seems to always pick on that. As far as I know, they are harmless. Andy > -Original Message--

Re: [R] RandomForest - getTree status code

2009-12-07 Thread Liaw, Andy
Is that the entire tree? If so there's a problem. The node status is defined as follows in rf.h of the source code: #define NODE_TERMINAL -1 #define NODE_TOSPLIT -2 #define NODE_INTERIOR -3 i.e., "-3" means "non-terminal" node. Andy > -Original Message- > From: r-help-boun...@r-pro

Re: [R] coefficients of each local polynomial from locfit

2009-12-08 Thread Liaw, Andy
I believe the prediction is done some some sort of grid, then interpolated to fill in the rest. This is, however, purely for computational reason, and not for any threoretical reasons. The formal definition of local polynomials is to do a weighted fit of polynomial at each point. Andy > -O

Re: [R] different randomForest performance for same data

2009-12-15 Thread Liaw, Andy
You need to be _extremely_ careful when assigning levels of factors. Look at this example: R> x1 = factor(c("a", "b", "c")) R> x2 = factor(c("a", "c", "c")) R> x3 = x2 R> levels(x3) <- levels(x1) R> x3 [1] a b b Levels: a b c I'll try to add more proofing in the code... Andy > -Origi

Re: [R] Error while using rfImpute

2009-05-08 Thread Liaw, Andy
Try re-starting R, load the randomForest package, and then run example(rfImpute) and see if that works. Can you post your sessionInfo() output? Andy From: cosmos science > > Dear Administrator, > > I am using linux (suse 10.2). While attempting rfImpute, I am > getting the > following error

Re: [R] pair matching

2009-05-12 Thread Liaw, Andy
If the matching need not be one-to-one, then you can just compute the Euclidean distances between the two vectors, then in each row (or column, which ever corresponds to the shorter vector) and find the smallest. This should be fairly easy to do. Andy From: Thomas S. Dye > > Given two numeric

Re: [R] questions on rpart (tree changes when rearrange the order of covariates?!)

2009-05-13 Thread Liaw, Andy
From: Uwe Ligges > > Yuanyuan wrote: > > Greetings, > > > > I am using rpart for classification with "class" method. > The test data is > > the Indian diabetes data from package mlbench. > > > > I fitted a classification tree firstly using the original > data, and then > > exchanged the order

Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Liaw, Andy
A few points to consider: - If all the data are numeric, then use matrices instead of data frames. - With either data frames or matrices, there is no way (that I'm aware of anyway) in R to stack them without making at least one copy in memory. - Since none of the files has a header row, I would

Re: [R] Using sample to create Training and Test sets

2009-05-15 Thread Liaw, Andy
Here's one possibility: idx <- sample(nrow(acc)) training <- acc[idx[1:400], ] testset <- acc[-idx[1:400], ] Andy From: Chris Arthur > > Forgive the newbie question, I want to select random rows from my > data.frame to create a test set (which I can do) but then I want to > create a training

Re: [R] Simulation from a multivariate normal distribution

2009-05-18 Thread Liaw, Andy
Check out the help page for replicate(). Andy From: barbara.r...@uniroma1.it > > I must to create an array with dimensions 120x8x500. Better I > have to make 500 simulations of 8 series of return from a multivariate > normal distribution. there's the command "mvrnorm" but how I > can do this

Re: [R] Constrained fits: y~a+b*x-c*x^2, with a,b,c >=0

2009-05-27 Thread Liaw, Andy
There's also the "nnls" (non-negative least squares) package on CRAN that might be useful, although I'm puzzled by the negative sign in front of c in Alex post... Cheers, Andy From: Berwin A Turlach > > G'day Alex, > > On Wed, 27 May 2009 11:51:39 +0200 > Alex van der Spek wrote: > > > I won

Re: [R] Heatmap

2009-06-08 Thread Liaw, Andy
Couldn't you get that just by giving heatmap() the transpose of your data? > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roy > Sent: Monday, June 08, 2009 9:32 AM > To: r-help@r-project.org > Subject: [R] Heatmap > >

Re: [R] Random Forest % Variation vs Psuedo-R^2?

2009-06-08 Thread Liaw, Andy
It actually means that the MSE (0.04605) is 130.42% of var(y), thus the model had not provided any better explanatory power than predicting by mean(y). The pseudo R^2 is just 100% - 130.42% = -30.42%. Remember that this is not the resubstituttion estimate because it is computed from the OOB estim

Re: [R] Problem in 'Apply' function: does anybody have othersolution

2009-06-17 Thread Liaw, Andy
Could it be that the "problematic" data came from csv files with quotes? What does str() on those data say? Recall that apply() will coerce the object to a matrix (if it's not), which means everything needs to be the same type, so if even just one column is read into R as non-numeric, the entire r

[R] where/what is i? for loop (black?) magic

2009-06-17 Thread Liaw, Andy
A colleague and I were trying to understand all the possible things one can do with for loops in R, and found some surprises. I think we've done sufficient detective work to have a good guess as to what's going on "underneath", but it would be nice to get some confirmation, and better yet, perhaps

Re: [R] where/what is i? for loop (black?) magic

2009-06-18 Thread Liaw, Andy
From: Duncan Murdoch > > Liaw, Andy wrote: > > A colleague and I were trying to understand all the > possible things one > > can do with for loops in R, and found some surprises. I think we've > > done sufficient detective work to have a good guess as to > wh

[R] FW: Can I estimate strength and correlation of Random Forest in R package " randomForest"?

2009-06-19 Thread Liaw, Andy
Didn't realize the message was cc'ed to R-help. Here's my reply... ____ From: Liaw, Andy Sent: Thursday, June 18, 2009 11:35 AM To: 'Li GUO' Subject: RE: Can I estimate strength and correlation of Random Forest in R package " ran

Re: [R] Do we have to control for block in block designs if it is insignificant?

2009-03-24 Thread Liaw, Andy
The short answer is "no" (meaning to leave the blocks in the model). As Frank Harrell said, you've spent your degrees of freedom. Go home and be happy. Best, Andy From: J S > Sent: Tuesday, March 24, 2009 9:49 AM > To: r-help@r-project.org > Subject: [R] Do we have to control for block in bloc

Re: [R] Random Forest Variable Importance

2009-03-27 Thread Liaw, Andy
Read ?importance, especially the "scale" argument. Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Li GUO > Sent: Friday, March 27, 2009 1:24 PM > To: r-help@r-project.org > Subject: [R] Random Forest Variable Importance

Re: [R] Concern with randomForest

2009-04-07 Thread Liaw, Andy
It's not nodesize in the formula, but var(y) (with divisor n, not n-1). It's sort of like the adjusted R-squared (because it uses mean squares instead of sum of squares), but uses the OOB estimate of MSE. If there's very little or no explanatory power in the predictor variables, this statistic wou

Re: [R] help with random forest package

2009-04-08 Thread Liaw, Andy
The source code of the whole package is available on CRAN. All packages are submitted to CRAN is source form. There's no "rule" per se that gives the final prediction, as the final prediction is the result of plural vote by all trees in the forest. You may want to look at the varUsed() and getTr

Re: [R] help with random forest package

2009-04-08 Thread Liaw, Andy
, just the underlying representation of the tree). Andy From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Wednesday, April 08, 2009 2:56 PM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random fore

Re: [R] Random Forests Variable Importance Question

2009-04-13 Thread Liaw, Andy
I'll take a shot. Let me try to explain the 3rd measure first. A RF model tries to predict an outcome variable (the classes) from a group of potential predictor variables (the "x"). If a predictor variable is "important" in making the prediction accurate, then by messing with it (e.g., giving

Re: [R] Re : Running random forest using different training andtesting schemes

2009-04-13 Thread Liaw, Andy
The R News article we put out after the first version of the package was released has examples of doing CV. You can also use the facilities in the caret package (on CRAN) or the MLInterface package (part of Bioconductor, not on CRAN). randomForest() itself does not do CV per se, but the OOB es

Re: [R] help with random forest package

2009-04-13 Thread Liaw, Andy
riday, April 10, 2009 10:44 AM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random forest package Hi, To be honest, I cannot really understand what is the meaning of the votes.. For example having five sa

Re: [R] help with random forest package

2009-04-13 Thread Liaw, Andy
ypo). Cheers, Andy From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Monday, April 13, 2009 9:44 AM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random forest package But how does it estimat

Re: [R] Random Forests: Question about R^2

2009-04-13 Thread Liaw, Andy
MSE is the mean squared residuals. For the training data, the OOB estimate is used (i.e., residual = data - OOB prediction, MSE = sum(residuals) / n, OOB prediction is the mean of predictions from all trees for which the case is OOB). It is _not_ the average OOB MSE of trees in the forest. I hop

Re: [R] Random Forests: Question about R^2

2009-04-13 Thread Liaw, Andy
Apologies: that should have been sum(residual^2)! > -Original Message- > From: Dimitri Liakhovitski [mailto:ld7...@gmail.com] > Sent: Monday, April 13, 2009 4:35 PM > To: Liaw, Andy > Cc: R-Help List > Subject: Re: [R] Random Forests: Question about R^2 > > And

Re: [R] Random Forests: Question about R^2

2009-04-21 Thread Liaw, Andy
gt; due to error, then R^2 = 1 - MSE/var(y). > > If it's correct, my last question would be: > I am getting as many R^2 as the number of trees because each time the > residuals are recalculated using all trees built so far, correct? > > Thank you very much! > D

Re: [R] Random Forests: Predictor importance for Regression Trees

2009-04-21 Thread Liaw, Andy
Yes, you've got it! Cheers, Andy From: Behalf Of Dimitri > > Hello! > > I think I am relatively clear on how predictor importance (the first > one) is calculated by Random Forests for a Classification tree: > > Importance of predictor P1 when the response variable is categorical: > > 1. For

Re: [R] help with random forest package

2009-04-28 Thread Liaw, Andy
look at the output of randomForest(..., inbag=TRUE) to see which data point is OOB for which tree. I hope that's clear now. Cheers, Andy ________ From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Tuesday, April 28, 2009 8:52 AM To: Liaw, Andy Cc: r-help

Re: [R] Problem with Random Forest predict

2009-05-01 Thread Liaw, Andy
This message landed in the "Junk e-mails" folder (of which I have no control), and it just so happens today that I glanced in the folder today, instead of just emptying it without checking, trusting the filter to do the Right Thing... Since you seem to run into preblem with predict.randomForest, o

Re: [R] "prob" in predict(randomForest)

2009-05-05 Thread Liaw, Andy
In short, yes. Andy From: Haring, Tim (LWF) > > Hi at all, > > maybe this question is quite simple for a statistician, but > for me it is not. After reading a lot of mail in the R-help > archive I`m still not quite sure I get it. > When applying a randomForest to a new dataset with > predi

Re: [R] Support Vector Machines

2009-05-05 Thread Liaw, Andy
svm() in the e1071 package is an interface to the libsvm code. Look at the link provided in the help page for that function. You will have to read up how density estimation is achieved via one-class SVM. Andy From: excalibur > > In the R-help of the svm function of the package e1071 it's >

Re: [R] calibration plot

2009-05-05 Thread Liaw, Andy
I believe something like: scatter.smooth(est.prob, as.numeric(y == "level of interest")) would be close. You may want to use a larger span than default. Andy From: abbas tavassoli > > Hi, > I have a binary variable and corresponding predicted > probability (using > logistic regression on

Re: [R] fast lm se?

2010-01-08 Thread Liaw, Andy
From: ivo welch > > dear R experts---I am using the coef() function to pick off > the coefficients > from an lm() object. alas, I also need the standard errors > and I need them > fast. I know I can do a "summary()" on the object and pick > them off this > way, but this computes other stuff I

Re: [R] Help me! using random Forest package, how to calculate Error Rates in the training set ?

2010-01-11 Thread Liaw, Andy
From: bbslover > > now I am learining random forest and using random forest > package, I can get > the OOB error rates, and test set rate, now I want to get the > training set > error rate, how can I do? > > pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest > =FALSE,do.trace=1e2)

Re: [R] randomForest maxnodes

2010-01-15 Thread Liaw, Andy
Please try to follow the posting guide and give a reproducible example, as below: R> library(randomForest) randomForest 4.5-34 Type rfNews() to see new features/changes/bug fixes. R> iris2 = iris[-5] R> iris.rf = randomForest(Petal.Width~., iris2, maxnodes=4, ntree=50) R> nodesize(iris.rf) Error:

Re: [R] locfit questions/problems

2010-01-25 Thread Liaw, Andy
Just replacing preplot() with predict() should be fine. BTW, it's always a good idea to specify the version of the package you're using as well. Best, Andy From: mh...@berkeley.edu > > Hi, > > I'm trying to work through the examples and code in Loader's > LOCAL REGRESSION AND LIKELIHOOD, and

Re: [R] What are Type II or III contrast? (contrast() in contrastpackage)

2010-02-04 Thread Liaw, Andy
From: Peng Yu > > On Wed, Feb 3, 2010 at 2:12 AM, Emmanuel Charpentier > wrote: > > Le mercredi 03 février 2010 à 00:01 -0500, David Winsemius a écrit : > >> On Feb 2, 2010, at 11:38 PM, Peng Yu wrote: > >> > >> > ?contrast in the contrast package gives me the following > description. > >> > How

Re: [R] How to export the examples in help(something) to a file?

2010-02-04 Thread Liaw, Andy
From: Peng Yu > > On Wed, Feb 3, 2010 at 10:01 AM, Peng Yu wrote: > > Some examples in the help page are too long to be copied > from screen. > > Could somebody let me know some easy way on how to extract > the example > > to a file so that I can play with them? > > I forget to mention. I use

Re: [R] Creating 3d partial dependence plots

2013-03-20 Thread Liaw, Andy
It needs to be done "by hand", in that partialPlot() does not handle more than one variable at a time. You need to modify its code to do that (and be ready to wait even longer, as it can be slow). Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-proj

Re: [R] Random Forest - Extract

2012-10-03 Thread Liaw, Andy
1. Not sure what you want. What "details" are you looking for exactly? If you call predict(trainset) without the newdata argument, you will get the (out-of-bag) prediction of the training set, which is exactly the "predicted" component of the RF object. 2. If you set type="votes" and norm.v

Re: [R] Random Forest for multiple categorical variables

2012-10-17 Thread Liaw, Andy
How about taking the combination of the two? E.g., gamma = factor(paste(alpha, beta1, sep=":")) and use gamma as the response. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gyanendra Pokharel Sent: Tuesday, October

Re: [R] rpart and randomforest results

2014-04-07 Thread Liaw, Andy
Hi Sonja, How did you build the rpart tree (i.e., what settings did you use in rpart.control)? Rpart by default will use cross validation to prune back the tree, whereas RF doesn't need that. There are other more subtle differences as well. If you want to compare single tree results, you rea

Re: [R] Partial dependence plot in randomForest package (all flat responses)

2012-11-26 Thread Liaw, Andy
Not unless we have more information. Please read the Posting Guide to see how to make it easier for people to answer your question. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Oritteropus Sent: Thursday, November 2

Re: [R] How do I make R randomForest model size smaller?

2012-12-04 Thread Liaw, Andy
Try the following: set.seed(100) rf1 <- randomForest(Species ~ ., data=iris) set.seed(100) rf2 <- randomForest(iris[1:4], iris$Species) object.size(rf1) object.size(rf2) str(rf1) str(rf2) You can try it on your own data. That should give you some hints about why the formula interface should be

Re: [R] Different results from random.Forest with test option and using predict function

2012-12-04 Thread Liaw, Andy
Without data to reproduce what you saw, we can only guess. One possibility is due to tie-breaking. There are several places where ties can occur and are broken at random, including at the prediction step. One difference between the two ways of doing prediction is that when it's all done withi

Re: [R] randomForest warning: The response has five or fewer unique values. Are you sure you want to do regression?

2014-03-24 Thread Liaw, Andy
If you are using the code, that's not really using randomForest directly. I don't understand the data structure you have (since you did not show anything) so can't really tell you much. In any case, that warning came from randomForest() when it is run in regression mode but the response has fe

Re: [R] FW: Nadaraya-Watson kernel

2013-11-07 Thread Liaw, Andy
Use KernSmooth (one of the recommended packages that are included in R distribution). E.g., > library(KernSmooth) KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 > x <- seq(0, 1, length=201) > y <- 4 * cos(2*pi*x) + rnorm(x) > f <- locpoly(x, y, degree=0, kernel="epan", bandwidth=.1) > plo

Re: [R] What is the difference between Mean Decrease Accuracy produced by importance(foo) vs foo$importance in a Random Forest Model?

2013-11-19 Thread Liaw, Andy
The difference is importance(..., scale=TRUE). See the help page for detail. If you extract the $importance component from a randomForest object, you do not get the scaling. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Beha

Re: [R] Split type in the RandomForest package

2013-11-20 Thread Liaw, Andy
Classification trees use the Gini index, whereas the regression trees use sum of squared errors. They are "hard-wired" into the C/Fortran code, so not easily changeable. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of

Re: [R] How do I extract Random Forest Terms and Probabilities?

2013-12-02 Thread Liaw, Andy
#2 can be done simply with predict(fmi, type="prob"). See the help page for predict.randomForest(). Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: Tuesday, November 26, 2013 6:57 PM To: R help Subject: Re:

Re: [R] interpretation of MDS plot in random forest

2013-12-02 Thread Liaw, Andy
Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help@r-project.org Subject

Re: [R] Variable importance - ANN

2013-12-04 Thread Liaw, Andy
You can try something like this: http://pubs.acs.org/doi/abs/10.1021/ci050022a Basically similar idea to what is done in random forests: permute predictor variable one at a time and see how much that degrades prediction performance. Cheers, Andy -Original Message- From: r-help-boun...@r

Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-01 Thread Liaw, Andy
You have not shown any code on exactly how you use na.roughfix(), so I can only guess. If you are doing something like: randomForest(y ~ ., mybigdata, na.action=na.roughfix, ...) I would not be surprised that it's taking very long on large datasets. Most likely it's caused by the formula inter

Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-01 Thread Liaw, Andy
roughfix(x)) user system elapsed 8.440.398.85 R 2.11.1, randomForest 4.5-35, Windows XP (32-bit), Thinkpad T61 with 2GB ram. Andy From: Mike Williamson [mailto:this.is@gmail.com] Sent: Thursday, July 01, 2010 12:48 PM To: Liaw, Andy Cc: r-h

Re: [R] anyone know why package "RandomForest" na.roughfix is so slow??

2010-07-02 Thread Liaw, Andy
I'll incorporate some of these ideas into the next release. Thanks! Best, Andy -Original Message- From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley Wickham Sent: Thursday, July 01, 2010 8:08 PM To: Mike Williamson Cc: Liaw, Andy; r-help Subject: Re: [R] a

Re: [R] randomForest outlier return NA

2010-07-15 Thread Liaw, Andy
There's a bug in the code. If you add row names to the X matrix befor you call randomForest(), you'd get: R> summary (outlier(mdl.rf) ) Min. 1st Qu. MedianMean 3rd Qu.Max. -1.0580 -0.5957 0. 0.6406 1.2650 9.5200 I'll fix this in the next release. Thanks for reporting. Bes

Re: [R] OT: Is randomization for targeted cancer therapies ethical?

2010-09-21 Thread Liaw, Andy
> From: jlu...@ria.buffalo.edu > > Clearly inferior treatments are unethical. The Big Question is: What constitute "clearly"? Who or How to decide what is "clearly"? I'm sure there are plenty of people who don't understand much Statistics and are perfectly willing to say the results on the tw

Re: [R] randomForest - partialPlot - Reg

2010-09-22 Thread Liaw, Andy
> From: Vijayan Padmanabhan > > Dear R Group > I had an observation that in some cases, when I use the > randomForest model > to create partialPlot in R using the package "randomForest" > the y-axis displays values that are more than -1! > It is a classification problem that i was trying to addr

Re: [R] Passing a function as a parameter...

2010-09-22 Thread Liaw, Andy
One possibility: R> f = function(x, f) eval(as.call(list(as.name(f), x))) R> f(1:10, "mean") [1] 5.5 R> f(1:10, "max") [1] 10 Andy From: Jonathan Greenberg > R-helpers: > > If I want to pass a character name of a function TO a > function, and then > have that function executed, how would I do

Re: [R] randomForest - PartialPlot - reg

2010-09-24 Thread Liaw, Andy
In a partial dependence plot, only the relative scale, not absolute scale, of the y-axis is meaningful. I.e., you can compare the range of the curves between partial dependence plots of two different variables, but not the actual numbers on the axis. The range is compressed compared to the origin

Re: [R] Force evaluation of variable when calling partialPlot

2010-10-04 Thread Liaw, Andy
The plot titles aren't pretty, but the following works for me: R> library(randomForest) randomForest 4.5-37 Type rfNews() to see new features/changes/bug fixes. R> set.seed(1004) R> iris.rf <- randomForest(iris[-5], iris[[5]], ntree=1001) R> par(mfrow=c(2,2)) R> for (i in 1:4) partialPlot(iris.rf,

Re: [R] RandomForest Proximity Matrix

2010-10-21 Thread Liaw, Andy
From: Michael Lindgren > > Greetings R Users! > > I am posting to inquire about the proximity matrix in the randomForest > R-package. I am having difficulty pushing very large data through the > algorithm and it appears to hang on the building of the prox > matrix. I have > read on Dr. Breiman

Re: [R] Random Forest AUC

2010-10-22 Thread Liaw, Andy
Let me expand on what Max showed. For the most part, performance on training set is meaningless. (That's the case for most algorithms, but especially so for RF.) In the default (and recommended) setting, the trees are grown to the maximum size, which means that quite likely there's only one data

Re: [R] Random Forest AUC

2010-10-23 Thread Liaw, Andy
What Breiman meant is that as the model gets more complex (i.e., as the number of trees tends to infinity) the geneeralization error (test set error) does not increase. This does not hold for boosting, for example; i.e., you can't "boost forever", which nececitate the need to find the optimal numb

  1   2   3   >