On Nov 18, 2009, at 4:55 PM, Satsangi, Vivek (GE Capital) wrote:
Folks,
I have the following code, that works fine on smaller data sets. For
larger datasets, it runs out of memory and runs way too slow because
we
are essentially creating large vectors with rep() and then calling
median() on it. (I learned this approach from a post on the web).
Below that, I have written the corresponding SAS code. The SAS code
works fast because I can just tell the proc summary (by the weights
option) that the Counts variable is a frequency.
So, the question is, is there a simple way to do the same thing in
R? I
have to run this on a large dataset -- for a small set it is not a
problem.
Not sure and I see no reproducible dataset (that I recognize), but
Harrell's Hmisc:::wtd.quantile might be an alternate approach.
---------------------- Begin R code
------------------------------------
N <- 1005 * 14;
myNorm <- data.frame(PaydexNormingCategory = numeric(N),
SIC = numeric(N), CatMedian = numeric(N));
k=1;
#j = 7941; ## For testing only
for (j in levels(SIC)){
for (i in levels(PaydexNormingCategory)){
myData <- dfpaydex[(Paydex==i) & (SIC==j),];
myMedian <- with(myData, levels(Paydex)[median(rep(as.numeric(Paydex),
Counts))]);
myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
k <- k+1;
}
}
---------------------- Begin SAS code
------------------------------------
proc summary data=SASUser.PaydexNormfull nway;
class PaydexNormingCategory SIC ;
weight Counts;
var Paydex;
output out=outstat (drop=_type_ _freq_)
median= / autoname;
run;
---------------------- End SAS code
------------------------------------
Thanks for your guidance!
Vivek Satsangi
GE Capital
Americas
GE imagination at work
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.