R squared is: 1 - sum(residuals^2)/crossprod(y - mean(y))
On Mon, Sep 8, 2008 at 2:27 PM, Dimitri Liakhovitski <[EMAIL PROTECTED]> wrote: > I could get an r squared from lm.fit by correlating fitted.values and > my response variable. > But could I do it somehow using Sums of Squares? I am clear on SS for > residuals. But where is SS for the model or the total SS in lm.fit > output? > Thank you! > Dimitri > > On Mon, Sep 8, 2008 at 1:57 PM, Gabor Grothendieck > <[EMAIL PROTECTED]> wrote: >> On Mon, Sep 8, 2008 at 1:47 PM, Dimitri Liakhovitski <[EMAIL PROTECTED]> >> wrote: >>> Thank you everyone for your responses. I'll answer several questions. >>> >>> 1. > Disclaimer: I have **NO IDEA** of the details of what you want >>> to do or why >>>> -- but I am willing to bet that there are better ways of doing it than 1.8 >>>> mm multiple refressions that take 270 secs each!! (which I find difficult >>>> to >>>> believe in itself -- are you sure you are doing things right? Something >>>> sounds very fishy here: R's regression code is typically very fast). >>> I probably should not bore everyone, but just to explain where the >>> large number is coming from. I have an experimental design with 7 >>> factors. Each factor has between 3 and 5 levels. Once you cross them >>> all, you end up with 18,000 cells. For each cell, I want to generate a >>> sample of N=100. For each sample I have to analyze the data using 3 >>> different statistical methods of analysis (the goal of the >>> Monte-Carlo) is to compare those methods. One of the methods requires >>> running of up to ~32,000 simple multiple regressions - yes just for >>> one sample and it's not a mistake. I test-ran one such analysis for a >>> sample with N=800 and 15 predictors and it took 270 seconds. R was >>> actually very fast - it ran each of the individual regressions in >>> about 0.008 seconds. Still I need something faster. >>> >>> 2. Sorry - what was the formula sum(lm.fit(x,y))$residuals^2) for? For >>> example, using it on my data, I got a value of 36,644... >> >> Its the sum of the squares of the residuals. >> >>> >>> 3. I know that for similarly challenging situations people did used >>> Fortran compilers. So, anyone heard of a free Fortran library or an >>> efficient piece of code? >>> >>> Thank you! >>> Dimitri >>> >>> >>>> >>>> -- Bert Gunter >>>> >>>> -----Original Message----- >>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On >>>> Behalf Of Dimitri Liakhovitski >>>> Sent: Monday, September 08, 2008 9:56 AM >>>> To: Prof Brian Ripley >>>> Cc: R-Help List >>>> Subject: Re: [R] Question about multiple regression >>>> >>>> Yes, see my previous e-mail on how long R takes (270 seconds for one >>>> of the 1,800,000 sets I need) - using system.time. >>>> Not sure how to test the same for Fortran... >>>> >>>> On Mon, Sep 8, 2008 at 12:51 PM, Prof Brian Ripley >>>> <[EMAIL PROTECTED]> wrote: >>>>> Are you sure R's ways are not fast enough (there are many layers >>>> underneath >>>>> lm)? For an example of how you might do this at C/Fortran level, see the >>>>> function lqs() in MASS. >>>>> >>>>> On Mon, 8 Sep 2008, Dimitri Liakhovitski wrote: >>>>> >>>>>> Dear R-list, >>>>>> maybe some of you could point me in the right direction: >>>>>> >>>>>> Are you aware of any FREE Fortran or Java libraries/actual pieces of >>>>>> code that are VERY efficient (time-wise) in running the regular linear >>>>>> least-squares multiple regression? >>>>> >>>>> A lot of the effort is in getting the right answer fast, including for >>>> e.g. >>>>> collinear inputs. >>>>> >>>>>> More specifically, I have to run small regression models (between 1 >>>>>> and 15 predictors) on samples of up to N=700 but thousands and >>>>>> thousands of them. >>>>>> >>>>>> I am designing a simulation in R and running those regressions and R >>>>>> itself is way too slow. So, I am thinking of compiling the regression >>>>>> run itself in Fortran and Java and then calling it from R. >>>>> >>>>> I think Java is unlikely to be fast compared to the Fortran R itself uses. >>>>> >>>>> Have you profiled to find where the time is really being spent (both R and >>>>> C/Fortran profiling if necessary). >>>>> >>>>>> >>>>>> Thank you very much for any advice! >>>>>> >>>>>> Dimitri Liakhovitski >>>>>> MarketTools, Inc. >>>>>> [EMAIL PROTECTED] >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>> >>>>> -- >>>>> Brian D. Ripley, [EMAIL PROTECTED] >>>>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >>>>> University of Oxford, Tel: +44 1865 272861 (self) >>>>> 1 South Parks Road, +44 1865 272866 (PA) >>>>> Oxford OX1 3TG, UK Fax: +44 1865 272595 >>>>> >>>> >>>> >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> MarketTools, Inc. >>>> [EMAIL PROTECTED] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> >>> >>> -- >>> Dimitri Liakhovitski >>> MarketTools, Inc. >>> [EMAIL PROTECTED] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > > > -- > Dimitri Liakhovitski > MarketTools, Inc. > [EMAIL PROTECTED] > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.