Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <[EMAIL PROTECTED]> wrote: > Anybody have any thoughts on this? Please? :) > > on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following: >> >> Hi everyone! >> >> I have a question about data processing efficiency. >> >> My data are as follows: I have a data set on quarterly institutional >> ownership of equities; some of them have had recent IPOs, some have not (I >> have a binary flag set). The total dataset size is 700k+ rows. >> >> My goal is this: For every quarter since issue for each IPO, I need to >> find a "matched" firm in the same industry, and close in market cap. So, >> e.g., for firm X, which had an IPO, i need to find a matched non-issuing >> firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in >> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300 >> of these). >> >> Thus it seems to me that I need to be doing a lot of data selection and >> subsetting, and looping (yikes!), but the result appears to be highly >> inefficient and takes ages (well, many hours). What I am doing, in >> pseudocode, is this: >> >> 1. for each quarter of data, getting out all the IPOs and all the eligible >> non-issuing firms. >> 2. for each IPO in a quarter, grab all the non-issuers in the same >> industry, sort them by size, and finally grab a matching firm closest in >> size (the exact procedure is to grab the closest bigger firm if one exists, >> and just the biggest available if all are smaller) >> 3. assign the matched firm-observation the same "quarters since issue" as >> the IPO being matched >> 4. rbind them all into the "matching" dataset. >> >> The function I currently have is pasted below, for your reference. Is >> there any way to make it produce the same result but much faster? >> Specifically, I am guessing eliminating some loops would be very good, but I >> don't see how, since I need to do some fancy footwork for each IPO in each >> quarter to find the matching firm. I'll be doing a few things similar to >> this, so it's somewhat important to up the efficiency of this. Maybe some of >> you R-fu masters can clue me in? :) >> >> I would appreciate any help, tips, tricks, tweaks, you name it! :) >> >> ========== my function below =========== >> >> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, >> quarters_since_issue=40) { >> >> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is >> cheaper, so typecast the result to matrix >> >> colnames = names(tfdata) >> >> quarterends = sort(unique(tfdata$DATE)) >> >> for (aquarter in quarterends) { >> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] >> >> tfdata_quarter_fitting_nonissuers = tfdata_quarter[ >> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) & >> (tfdata_quarter$IPO.Flag == 0), ] >> tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag >> == 1, ] >> >> for (i in 1:nrow(tfdata_quarter_ipoissuers)) { >> arow = tfdata_quarter_ipoissuers[i,] >> industrypeers = tfdata_quarter_fitting_nonissuers[ >> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] >> industrypeers = industrypeers[ >> order(industrypeers$Market.Cap.13f), ] >> if ( nrow(industrypeers) > 0 ) { >> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >= >> arow$Market.Cap.13f, ]) > 0 ) { >> bestpeer = industrypeers[industrypeers$Market.Cap.13f >> >= arow$Market.Cap.13f, ][1,] >> } >> else { >> bestpeer = industrypeers[nrow(industrypeers),] >> } >> bestpeer$Quarters.Since.IPO.Issue = >> arow$Quarters.Since.IPO.Issue >> >> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == >> bestpeer$PERMNO] = 1 >> result = rbind(result, as.matrix(bestpeer)) >> } >> } >> #result = rbind(result, tfdata_quarter) >> print (aquarter) >> } >> >> result = as.data.frame(result) >> names(result) = colnames >> return(result) >> >> } >> >> ========= end of my function ============= >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.