On Wed, 11 Jan 2012, iliketurtles wrote: > ##I have 2 columns of data. The first column is unique "event IDs" that > represent a phone call made to a customer. > ###So, if you see 3 entries together in the first column like follows: > > matrix(c("call1a","call1a","call1a") ) > > ##then this means that this particular phone call (the first call that's > logged in the data set) was transferred > ##between 3 different "modules" before the call was terminated. > > ##The second column is a numerical description of the module the call > started with and then got transferred to prior to ##call termination. Now, > I'll construct a ##representative array of the type of data I'm dealing > with (the real data set goes ##on for X00,000s of rows): > ##(Ignore how I construct the following array, it’s completely unrelated to > how the actual data set was constructed). > > > a<-sapply(1:50,function(i){paste("call",i,sep="",collapse="")}) > development.a<-seq(1,40,3) > development.a2<-seq(1,40,5) > a[development.a]<-a[development.a+1] > a[development.a2]<-a[development.a2+1] > a[1:2]<-"call2a";a[3]<-"call3a";a[4:5]<-"call5a";a[6:8]<-"call8a";a[9]<-"ca > ll9a" > b<-c(920010,960010,820009,920010,960500,970050,930010,920010,960500,970050 > ,930900,870010,840010,960500,920010,970050,930010,960500,920010,970050,9300 > 10,960010,920010,940010,960010,970010,960500,920010,970050,930010,960500,92 > 0010,970050,930010,960500,920010,970050,930010,920010,960500,970050,930010, > 920009,960500,970050,930009,940010,960500,960500,960500) > data<-as.data.frame(cbind(a,b)) > colnames(data)<-c("phone calls","modules") > dim(data) > print(data[1:10,]) #sample of 10 rows > > # Note that in the real data set, data[,2] ranges from 810,000 to 999,999. > I've been tasked with the following: > # "For each phone call that BEGINS with the module which is denoted by 81 > (i.e. of the form 81X,XXX), what is the expected number of modules in these > calls?" > #Then it's the same question for each module beginning with 82, 83, 84..... > all the way until 99. > #I've created code that I think works for this, but I can't actually run it > on the whole data set. I left it for 30 minutes and it only had about #5% > of the task completed (I clicked "STOP" then checked my output to see if I > did it properly, and it seems correct). > #I know the apply() family specializes in vector operations, but I can't > figure out how to complete the above question in any way other than #loops. > > L<-data > > A<-array(0,dim=c(19,2));rownames(A)<-seq(81,99,1) > A<-data.frame(A) > > for(i in 1:(nrow(L)-1)) > { > if(L[(i+1),1]!=L[i,1]) > { > > A[paste(strsplit(as.character(L[i+1,2]),"")[[1]][1:2],sep="",collapse=""),1 > ]<- { > > A[paste(strsplit(as.character(L[i+1,2]),"")[[1]][1:2],sep="",collapse=""),1 > ]+length(grep(as.character(L[i+1,1]),L[,1],value=FALSE)) #aggregate number > of modules in the calls that begin with XX (not yet averaged). > } > > A[paste(strsplit(as.character(L[i+1,2]),"")[[1]][1:2],sep="",collapse=""),2 > ]<- { > > A[paste(strsplit(as.character(L[i+1,2]),"")[[1]][1:2],sep="",collapse=""),2 > ]+1 } > } > > } > > #If I can get this code to be more memory efficient such that I can do it > on a 400,000 row data set, I can do, for example, > > A[17,1]/A[17,2] > > #and I'll arrive at the mean number of modules per call where the call > starts with a module that starts with 97. > > A[17,1] > #is 10, which means that, out of every single call that started with a > module of 97X,XXX, > #they went through 10 modules in total. > > A[17,2] > #is 6, which means that there was 6 calls in total that began with a > 97X,XXX module. > > #Hence, > > > A[17,1]/A[17,2] > > #is the average number of modules that were executed in all the calls that > began with a 97X,XXX module. > > > ----- > ---- > > Isaac > Research Assistant > Quantitative Finance Faculty, UTS
I don't see any need for you to use data frames. If you make A and data (not a good use of a variable name) just matrices, you get the same answers at about 10 times the speed (using your example). Hope this helps, Ray Brownrigg ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.