Hi,

as.integer(dat$COUNTRY) # would be the easiest (Rui's solution).

Other options could be also used:
library(plyr)
 
as.integer(mapvalues(dat$COUNTRY,levels(dat$COUNTRY),seq(length(levels(dat$COUNTRY)))))
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
#or
match(dat$COUNTRY,levels(dat$COUNTRY))
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4


#if `COUNTRY` is not factor

dat$COUNTRY<- as.character(dat$COUNTRY)
 
as.integer(mapvalues(dat$COUNTRY,unique(dat$COUNTRY),seq(length(unique(dat$COUNTRY)))))
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

#or (if it is sorted already)
 (seq_along(dat$COUNTRY)-1)%/%as.vector(table(dat$COUNTRY))+1
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
A.K.


----- Original Message -----
From: Rui Barradas <ruipbarra...@sapo.pt>
To: serenamas...@gmail.com
Cc: 'r-help' <r-help@r-project.org>
Sent: Saturday, July 13, 2013 12:04 PM
Subject: Re: [R] How to set panel data format

Hello,

It's better if you keep this on the list, the odds of getting more and 
better answers is greater.

Inline.

Em 13-07-2013 15:38, serenamas...@gmail.com escreveu:
> Hi Rui,
> thanks for your reply.
>
> No, my problem isn't one of reshaping. It is just that I want R to know I 
> have a panel and not just cross sections or time series.
>
> In other words If I had cross section data:
>
> COUNTRY   YEAR   GDP
> Albania        1999     3
> Barbados    1999     5
> Congo          1999     1
> Denmark    1999     11
> etc.                ..             ..
>
> My ID here is country, but every observation is a new cluster independent of 
> each other, so I don't care to let R know because the ID is a unique 
> identifier.
>
> Whereas if I have a panel
>
> COUNTRY   YEAR   GDP
> Albania        1999      3
> Albania        2000      3.5
> Albania        2001      3.7
> Albania        2002      4
> Albania        2003      4.5
> Barbados   1999       5
> Barbados   2000       5
> Barbados   2001       5.1
> Barbados   2002       4
> Barbados   2003       3
> Congo         1999      1
> Congo         2000      2
> Congo         2001      2
> Congo         2002      3
> Congo         2003      4
> Denmark    1999     11
> Denmark    2000     12
> Denmark    2001     13
> Denmark    2002     10
> Denmark    2003     10
> etc.                ..             ..
>
> How am I going to tell R that Albania is one same ID for all the 5 years I 
> have in the panel, in other words, Albania has to be identified by the same 
> number in the "factor" vector which R codes it with. Then Barbados is ID 2 in 
> all its years, Congo has ID 3 and so on.

R already does that, factors are coded as integers:

as.integer(dat$COUNTRY) # Albania is 1, etc


> In STATA, you sort 'by country year' and the program knows it is a panel of 
> entities observed more than once over time.  But I am not sure how to let R 
> know the same.
>
> In practice the reason why it is important to define where a country ends and 
> where a new begins is because
>
> 1) if one creates lags of variables and the program doesn't know where the 
> boundaries between countries are, the lag for the first year of Barbados in 
> my previous example will be calculated using the last year of Albania, that 
> is, the preceding country.

A way of doing this, equivalent to the previous line of code if the 
countries are grouped consecutively, is

cumsum(c(TRUE, dat$COUNTRY[-nrow(dat)] != dat$COUNTRY[-1L]))
>
> 2) I need to create countrydummies that take the value of 1 whenever a 
> country ID is equal to 1, so if Albania has 5 years of observations and each 
> of the year observations appears with a different ID, the country dummies 
> will not be created. Instead if Albania has the same country identifier (1) 
> for all the years in which it is observed, the country dummy will be the same 
> and ==1 whenever Albania is the country observed

I doubt you need to create dummuies, R does it for you when you create a 
factor. internally, factors are coded as integers, so all you need is to 
coerce them to integer like I've said earlier.

Rui Barradas

>
> Hope this makes it clearer,
> Thanks,
> Serena
>
> _____________________________________
> Sent from http://r.789695.n4.nabble.com
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to