I have a data set with observations on 549 cities spanning an 18 year period. However, some of cities did not report in one or more of the 18 years. I would like to implement the procedure suggested by Wooldridge section 17.1.3 in his "Econometric analysis of cross section and panel data" to correct for attrition. For example the table below indicates that the 3rd and the 7th cities in the data set do not have observations for several years. The Wooldridge procedure requires the generation of a selection variable that takes on the value of 1 if the city reports in that year and 0 otherwise. How do I assign a zero to a city when it does not have an observation for that year?
For example. Suppose I have the following data set. The observation range over three years 1990-1992. But some cities did not report in some years. The original data looks like this: Cicoid year other_variables seclection-variable 1 1990 x x x x x x x 1 1 1991 xxxxxxxxxx 1 2 1991 xxxxxxxxxx 1 3 1990 xxxxxxxxxx 1 3 1991 xxxxxxxxxx 1 3 1992 xxxxxxxxxx 1 I would like to get a data set that looks like this: Cicoid year other_variables seclection-variable 1 1990 x x x x x x x 1 1 1991 xxxxxxxxxx 1 1 1992 ....... 0 2 1990 ........ 0 2 1991 xxxxxxxxxx 1 2 1992 ........ 0 3 1990 xxxxxxxxxx 1 3 1991 xxxxxxxxxx 1 3 1992 xxxxxxxxxx 1 I can reshape the data using STATA with the following three simple commands: xtset Cicoid year tsfill ,full replace selection_variable=0 if selection_variable==. I proclaim the data as a panel series identifying the ID and TIME index variables. Then use the time-series fill command. I have searched the help and vignettes of both the "zoo" and "plm" packages but cannot find the solution. Can anyone help? Thanks, Richard Saba ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.