Hi, May be this helps: dat1<- read.table(text=" ID county date company 1 x 1 comp1 2 y 1 comp3 3 y 2 comp1 4 y 3 comp1 5 x 2 comp2 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<- dat1 dat1$answer<-unsplit(lapply(split(dat1,dat1$county),function(x) do.call(rbind,lapply(seq_len(nrow(x)),function(i) {x1<-x[1:i,]; x2<-table(x1$company)/sum(table(x1$company));sum(x2^2)}))),dat1$county) dat1 # ID county date company answer #1 1 x 1 comp1 1.0000000 #2 2 y 1 comp3 1.0000000 #3 3 y 2 comp1 0.5000000 #4 4 y 3 comp1 0.5555556 #5 5 x 2 comp2 0.5000000
#or dat2$answer<-with(dat2,unlist(ave(company,county,FUN=function(x) lapply(seq_along(x),function(i) {x1<-table(x[1:i]);sum((x1/sum(x1))^2)})))) dat2 # ID county date company answer #1 1 x 1 comp1 1.0000000 #2 2 y 1 comp3 1.0000000 #3 3 y 2 comp1 0.5000000 #4 4 y 3 comp1 0.5555556 #5 5 x 2 comp2 0.5000000 A.K. Hi - I have a seemingly complex data summarizing problem that I am having a hard time wrapping my mind around. What I'm trying to do is sum the square of all company market shares in a given county, UP TO that corresponding time. Sum of market share is defined as: Number of company observations/ Total observations. Here is example data and desired answer: ID county date company answer 1 x 1 comp1 1 2 y 1 comp3 1 3 y 2 comp1 0.5 4 y 3 comp1 0.55556 5 x 2 comp2 0.5 For example, to get the answer for ID 4, we look at county y, dates 1, 2, 3 and sum: [(2/3)comp1]^2 +[(1/3)comp3]^2 = 0.55556 I've tried cumsum, but am simply stuck given all of the different conditions. I have a large matrix of data for this with several hundred companies, tens of counties and unique dates. Any help would be extremely appreciated. Thank you, ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.