Re: [R] quantreg speed

William Dunlap Sat, 15 Nov 2014 17:22:49 -0800

You can time it yourself on increasingly large subsets of your data.  E.g.,

> dat <- data.frame(x1=rnorm(1e6), x2=rnorm(1e6),
x3=sample(c("A","B","C"),size=1e6,replace=TRUE))
> dat$y <- with(dat, x1 + 2*(x3=="B")*x2 + rnorm(1e6))
> t <- vapply(n<-4^(3:10),FUN=function(n){d<-dat[seq_len(n),];
print(system.time(rq(data=d, y ~ x1 + x2*x3,
tau=0.9)))},FUN.VALUE=numeric(5))
   user  system elapsed
      0       0       0
   user  system elapsed
      0       0       0
   user  system elapsed
   0.02    0.00    0.01
   user  system elapsed
   0.01    0.00    0.02
   user  system elapsed
   0.10    0.00    0.11
   user  system elapsed
   1.09    0.00    1.10
   user  system elapsed
  13.05    0.02   13.07
   user  system elapsed
 273.30    0.11  273.74
> t
           [,1] [,2] [,3] [,4] [,5] [,6]  [,7]   [,8]
user.self     0    0 0.02 0.01 0.10 1.09 13.05 273.30
sys.self      0    0 0.00 0.00 0.00 0.00  0.02   0.11
elapsed       0    0 0.01 0.02 0.11 1.10 13.07 273.74
user.child   NA   NA   NA   NA   NA   NA    NA     NA
sys.child    NA   NA   NA   NA   NA   NA    NA     NA

Do some regressions on t["elapsed",] as a function of n and predict up to
n=10^7.  E.g.,
> summary(lm(t["elapsed",] ~ poly(n,4)))

Call:
lm(formula = t["elapsed", ] ~ poly(n, 4))

Residuals:
         1          2          3          4          5          6
 7          8
-2.375e-03 -2.970e-03  4.484e-03  1.674e-03 -8.723e-04  6.096e-05
-9.199e-07  2.715e-09

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)
(Intercept) 3.601e+01  1.261e-03 28564.33 9.46e-14 ***
poly(n, 4)1 2.493e+02  3.565e-03 69917.04 6.45e-15 ***
poly(n, 4)2 5.093e+01  3.565e-03 14284.61 7.57e-13 ***
poly(n, 4)3 1.158e+00  3.565e-03   324.83 6.43e-08 ***
poly(n, 4)4 4.392e-02  3.565e-03    12.32  0.00115 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.003565 on 3 degrees of freedom
Multiple R-squared:      1,     Adjusted R-squared:      1
F-statistic: 1.273e+09 on 4 and 3 DF,  p-value: 3.575e-14


It does not look good for n=10^7.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Nov 15, 2014 at 12:12 PM, Yunqi Zhang <yqzh...@ucsd.edu> wrote:

> Hi all,
>
> I'm using quantreg rq() to perform quantile regression on a large data set.
> Each record has 4 fields and there are about 18 million records in total. I
> wonder if anyone has tried rq() on a large dataset and how long I should
> expect it to finish. Or it is simply too large and I should subsample the
> data. I would like to have an idea before I start to run and wait forever.
>
> In addition, I will appreciate if anyone could give me an idea how long it
> takes for rq() to run approximately for certain dataset size.
>
> Yunqi
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quantreg speed

Reply via email to