On May 28, 2010, at 5:58 PM, Christian Schoder wrote:
Dear R-users,
I use firm-level data in panel structure. I would like to drop all
firms that have less than x observations over the time scale in any
of the variables considered. I would appreciate any help that (a)
indicates relevant literature or websites or (b) indicates the code
that could solve the problem.
Here, a detailed illustration of my problem: My data set is of the
form
df
id y z
1 a 1 1
2 b NA 2
3 b 3 3
4 c 2 2
5 c 4 4
6 c 5 NA
7 d 6 NA
8 d 5 5
9 d 6 6
10 d 7 7
11 e NA NA
12 e NA 4
13 e 3 3
where id is the index of the firm, and y and z are observations such
as assets and sales. Now I would like to apply a procedure that
drops all firms which have less then 2 observed realizations in y or
z.
I try to avoid naming objects with common function names like df:
> dfrm$nrecy <- ave(dfrm$y , dfrm$id, FUN=function(x) sum(!is.na(x)) )
> dfrm$nrecz <- ave(dfrm$z , dfrm$id, FUN=function(x) sum(!is.na(x)) )
> dfrm
id y z nrecy nrecz
1 a 1 1 1 1
2 b NA 2 1 2
3 b 3 3 1 2
4 c 2 2 3 2
5 c 4 4 3 2
6 c 5 NA 3 2
7 d 6 NA 4 3
8 d 5 5 4 3
9 d 6 6 4 3
10 d 7 7 4 3
11 e NA NA 1 2
12 e NA 4 1 2
13 e 3 3 1 2
> dfrm[with(dfrm, pmin(nrecy, nrecz)>1), ]
id y z nrecy nrecz
4 c 2 2 3 2
5 c 4 4 3 2
6 c 5 NA 3 2
7 d 6 NA 4 3
8 d 5 5 4 3
9 d 6 6 4 3
10 d 7 7 4 3
Now it does not thereby assure that you will have at least 2 of each
id with complete observationssince. But if you wanted a solution to
that problem you would need a better testing data.frame.
Thus, it should give me a data.frame which looks like
df1
id y z
1 c 2 2
2 c 4 4
3 c 5 NA
4 d 6 NA
5 d 5 5
6 d 6 6
7 d 7 7
Thank you very much!
Christian Schoder
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.