I would like to summarize. Would you please confirm that my summary is correct? Thank you very much!
Determining R^2 in Random Forests (for a Regression Forest): 1. For each individual case, record a mean prediction on the dependent variable y across all trees for which the case is OOB (Out-of-Bag); 2. For each individual case, calculate a residual: residual = observed y - mean predicted y (from step 1) 3. Calculate mean square residual MSE: MSE = sum of all individual residuals (from step 2) / n 4. Because MSE/var(y) represents the proportion of y variance that is due to error, then R^2 = 1 - MSE/var(y). If it's correct, my last question would be: I am getting as many R^2 as the number of trees because each time the residuals are recalculated using all trees built so far, correct? Thank you very much! Dimitri On Mon, Apr 13, 2009 at 6:22 PM, Liaw, Andy <andy_l...@merck.com> wrote: > Apologies: that should have been sum(residual^2)! > >> -----Original Message----- >> From: Dimitri Liakhovitski [mailto:ld7...@gmail.com] >> Sent: Monday, April 13, 2009 4:35 PM >> To: Liaw, Andy >> Cc: R-Help List >> Subject: Re: [R] Random Forests: Question about R^2 >> >> Andy, >> thank you very much! >> One clarification question: >> >> If MSE = sum(residuals) / n, then >> in the formula (1 - mse / Var(y)) - shouldn't one square mse before >> dividing by variance? >> >> Dimitri >> >> >> On Mon, Apr 13, 2009 at 10:52 AM, Liaw, Andy >> <andy_l...@merck.com> wrote: >> > MSE is the mean squared residuals. For the training data, the OOB >> > estimate is used (i.e., residual = data - OOB prediction, MSE = >> > sum(residuals) / n, OOB prediction is the mean of >> predictions from all >> > trees for which the case is OOB). It is _not_ the average >> OOB MSE of >> > trees in the forest. >> > >> > I hope there's no question about how the pseudo R^2 is computed on a >> > test set? If you understand how that's done, I assume the >> confusion is >> > only how the OOB MSE is formed. >> > >> > Best, >> > Andy >> > >> > From: Dimitri Liakhovitski >> >> >> >> Dear Random Forests gurus, >> >> >> >> I have a question about R^2 provided by randomForest (for >> regression). >> >> I don't succeed in finding this information. >> >> >> >> In the help file for randomForest under "Value" it says: >> >> >> >> rsq: (regression only) - "pseudo R-squared'': 1 - mse / Var(y). >> >> >> >> Could someone please explain in somewhat more detail how >> exactly R^2 >> >> is calculated? >> >> Is "mse" mean squared error for prediction? >> >> Is "mse" an average of mse's for all trees run on out-of-bag >> >> holdout samples? >> >> In other words - is this R^2 based on out-of-bag samples? >> >> >> >> Thank you very much for clarification! >> >> >> >> -- >> >> Dimitri Liakhovitski >> >> MarketTools, Inc. >> >> dimitri.liakhovit...@markettools.com >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > Notice: This e-mail message, together with any >> attachments, contains >> > information of Merck & Co., Inc. (One Merck Drive, >> Whitehouse Station, >> > New Jersey, USA 08889), and/or its affiliates (which may be known >> > outside the United States as Merck Frosst, Merck Sharp & Dohme or >> > MSD and in Japan, as Banyu - direct contact information for >> affiliates is >> > available at http://www.merck.com/contact/contacts.html) that may be >> > confidential, proprietary copyrighted and/or legally >> privileged. It is >> > intended solely for the use of the individual or entity >> named on this >> > message. If you are not the intended recipient, and have >> received this >> > message in error, please notify us immediately by reply e-mail and >> > then delete it from your system. >> > >> > >> >> >> >> -- >> Dimitri Liakhovitski >> MarketTools, Inc. >> dimitri.liakhovit...@markettools.com >> > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, > New Jersey, USA 08889), and/or its affiliates (which may be known > outside the United States as Merck Frosst, Merck Sharp & Dohme or > MSD and in Japan, as Banyu - direct contact information for affiliates is > available at http://www.merck.com/contact/contacts.html) that may be > confidential, proprietary copyrighted and/or legally privileged. It is > intended solely for the use of the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately by reply e-mail and > then delete it from your system. > > -- Dimitri Liakhovitski MarketTools, Inc. dimitri.liakhovit...@markettools.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.