You can time it yourself on increasingly large subsets of your data. E.g.,
> dat <- data.frame(x1=rnorm(1e6), x2=rnorm(1e6), x3=sample(c("A","B","C"),size=1e6,replace=TRUE)) > dat$y <- with(dat, x1 + 2*(x3=="B")*x2 + rnorm(1e6)) > t <- vapply(n<-4^(3:10),FUN=function(n){d<-dat[seq_len(n),]; print(system.time(rq(data=d, y ~ x1 + x2*x3, tau=0.9)))},FUN.VALUE=numeric(5)) user system elapsed 0 0 0 user system elapsed 0 0 0 user system elapsed 0.02 0.00 0.01 user system elapsed 0.01 0.00 0.02 user system elapsed 0.10 0.00 0.11 user system elapsed 1.09 0.00 1.10 user system elapsed 13.05 0.02 13.07 user system elapsed 273.30 0.11 273.74 > t [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] user.self 0 0 0.02 0.01 0.10 1.09 13.05 273.30 sys.self 0 0 0.00 0.00 0.00 0.00 0.02 0.11 elapsed 0 0 0.01 0.02 0.11 1.10 13.07 273.74 user.child NA NA NA NA NA NA NA NA sys.child NA NA NA NA NA NA NA NA Do some regressions on t["elapsed",] as a function of n and predict up to n=10^7. E.g., > summary(lm(t["elapsed",] ~ poly(n,4))) Call: lm(formula = t["elapsed", ] ~ poly(n, 4)) Residuals: 1 2 3 4 5 6 7 8 -2.375e-03 -2.970e-03 4.484e-03 1.674e-03 -8.723e-04 6.096e-05 -9.199e-07 2.715e-09 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.601e+01 1.261e-03 28564.33 9.46e-14 *** poly(n, 4)1 2.493e+02 3.565e-03 69917.04 6.45e-15 *** poly(n, 4)2 5.093e+01 3.565e-03 14284.61 7.57e-13 *** poly(n, 4)3 1.158e+00 3.565e-03 324.83 6.43e-08 *** poly(n, 4)4 4.392e-02 3.565e-03 12.32 0.00115 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.003565 on 3 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.273e+09 on 4 and 3 DF, p-value: 3.575e-14 It does not look good for n=10^7. Bill Dunlap TIBCO Software wdunlap tibco.com On Sat, Nov 15, 2014 at 12:12 PM, Yunqi Zhang <yqzh...@ucsd.edu> wrote: > Hi all, > > I'm using quantreg rq() to perform quantile regression on a large data set. > Each record has 4 fields and there are about 18 million records in total. I > wonder if anyone has tried rq() on a large dataset and how long I should > expect it to finish. Or it is simply too large and I should subsample the > data. I would like to have an idea before I start to run and wait forever. > > In addition, I will appreciate if anyone could give me an idea how long it > takes for rq() to run approximately for certain dataset size. > > Yunqi > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.