Again my problem, better explained.
#I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1<-sort(rep(1:10,5),decreasing=F) year1<-rep(2000:2004,10) industry1<-rep(20,50) dummy1<-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1<-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,6666,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1<-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)<-c("firm","year","industry","dummy","dimension") firm2<-sort(rep(11:15,3),decreasing=F) year2<-rep(2001:2003,5) industry2<-rep(30,15) dummy2<-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2<-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2<-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)<-c("firm","year","industry","dummy","dimension") firm3<-sort(rep(16:20,4),decreasing=F) year3<-rep(2001:2004,5) industry3<-rep(40,20) dummy3<-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3<-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3<-data.frame(firm3,year3,industry3,dummy3,dimension3) data3 colnames(data3)<-c("firm","year","industry","dummy","dimension") final1<-rbind(data1,data2) final2<-rbind(final1,data3) final2 final3<-final2[order(final2$year,final2$industry,final2$dimension),] final3 #So my data is final3 is like this: firm year industry dummy dimension 26 6 2000 20 0 781 1 1 2000 20 0 2120 21 5 2000 20 1 2189 36 8 2000 20 1 2765 16 4 2000 20 0 3178 31 7 2000 20 1 3245 11 3 2000 20 1 4532 6 2 2000 20 0 4890 41 9 2000 20 0 5438 46 10 2000 20 0 7690 2 1 2001 20 0 345 37 8 2001 20 1 389 32 7 2001 20 0 1234 17 4 2001 20 0 2678 7 2 2001 20 1 2789 22 5 2001 20 1 4289 47 10 2001 20 0 6022 12 3 2001 20 1 6593 27 6 2001 20 0 35489 42 9 2001 20 1 37824 60 14 2001 30 1 2341 54 12 2001 30 0 2345 57 13 2001 30 1 3245 51 11 2001 30 0 12456 63 15 2001 30 1 12900 78 19 2001 40 1 389 74 18 2001 40 1 1288 82 20 2001 40 0 6438 70 17 2001 40 1 6754 66 16 2001 40 0 23456 43 9 2002 20 0 23 33 7 2002 20 1 25 3 1 2002 20 1 2341 28 6 2002 20 0 2345 8 2 2002 20 1 3412 48 10 2002 20 1 3678 18 4 2002 20 0 6666 23 5 2002 20 0 8543 13 3 2002 20 0 12900 38 8 2002 20 1 23456 64 15 2002 30 0 123 52 11 2002 30 0 781 58 13 2002 30 1 2120 61 14 2002 30 1 5678 55 12 2002 30 0 5754 67 16 2002 40 0 1181 75 18 2002 40 1 1200 71 17 2002 40 0 8976 79 19 2002 40 0 23456 83 20 2002 40 1 24824 14 3 2003 20 0 123 24 5 2003 20 0 637 19 4 2003 20 1 647 34 7 2003 20 0 1200 39 8 2003 20 1 2367 44 9 2003 20 0 2897 4 1 2003 20 1 5678 29 6 2003 20 0 5754 49 10 2003 20 1 9431 9 2 2003 20 0 9500 59 13 2003 30 1 345 65 15 2003 30 1 2345 56 12 2003 30 0 8976 62 14 2003 30 1 10900 53 11 2003 30 0 32489 84 20 2003 40 0 23 76 18 2003 40 1 2345 80 19 2003 40 0 2367 72 17 2003 40 1 3245 68 16 2003 40 1 32489 15 3 2004 20 0 2345 35 7 2004 20 1 2345 50 10 2004 20 1 2890 45 9 2004 20 0 3456 40 8 2004 20 0 3892 10 2 2004 20 1 8765 30 6 2004 20 0 8976 5 1 2004 20 0 10900 25 5 2004 20 0 23456 20 4 2004 20 1 23789 73 17 2004 40 0 1234 69 16 2004 40 0 2345 77 18 2004 40 1 2765 85 20 2004 40 0 2897 81 19 2004 40 0 3892 I want to keep couples of firms one with dummy=1 and other with dummy=0 that matchs in industry, firm and dimension. But dimension doesn't need to be exactly the same, it is why I refer an interval of + or - 10%. For example firm 1 matchs with firm 5, because they have the same year, industry, dimension (10% x 2120 = 212 and 2189-2120<212) and firm 1 is dummy=0 and firm 5 is dummy=1. So I want to delete firm 6 because it doesn't macth with any firm, and keep firm 1 and 5. firm year industry dummy dimension 26 6 2000 20 0 781 1 1 2000 20 0 2120 21 5 2000 20 1 2189 Next, Now I can match firm 4 with firm 7 and delete firm 8. 36 8 2000 20 1 2765 16 4 2000 20 0 3178 31 7 2000 20 1 3245 And so on... At the end I want to keep only pairs of firms, matched by year, industry and dimension. If I separate firms with dummy=1 from firms with dummy=0 in two separated dataframes, I have two matched samples with the same number of observations. That's what I want. Thank you, CecĂlia Carmo Universidade de Aveiro - Portugal [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.