Hi,
as.integer(dat$COUNTRY) # would be the easiest (Rui's solution). Other options could be also used: library(plyr) as.integer(mapvalues(dat$COUNTRY,levels(dat$COUNTRY),seq(length(levels(dat$COUNTRY))))) # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 #or match(dat$COUNTRY,levels(dat$COUNTRY)) # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 #if `COUNTRY` is not factor dat$COUNTRY<- as.character(dat$COUNTRY) as.integer(mapvalues(dat$COUNTRY,unique(dat$COUNTRY),seq(length(unique(dat$COUNTRY))))) # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 #or (if it is sorted already) (seq_along(dat$COUNTRY)-1)%/%as.vector(table(dat$COUNTRY))+1 # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 A.K. ----- Original Message ----- From: Rui Barradas <ruipbarra...@sapo.pt> To: serenamas...@gmail.com Cc: 'r-help' <r-help@r-project.org> Sent: Saturday, July 13, 2013 12:04 PM Subject: Re: [R] How to set panel data format Hello, It's better if you keep this on the list, the odds of getting more and better answers is greater. Inline. Em 13-07-2013 15:38, serenamas...@gmail.com escreveu: > Hi Rui, > thanks for your reply. > > No, my problem isn't one of reshaping. It is just that I want R to know I > have a panel and not just cross sections or time series. > > In other words If I had cross section data: > > COUNTRY YEAR GDP > Albania 1999 3 > Barbados 1999 5 > Congo 1999 1 > Denmark 1999 11 > etc. .. .. > > My ID here is country, but every observation is a new cluster independent of > each other, so I don't care to let R know because the ID is a unique > identifier. > > Whereas if I have a panel > > COUNTRY YEAR GDP > Albania 1999 3 > Albania 2000 3.5 > Albania 2001 3.7 > Albania 2002 4 > Albania 2003 4.5 > Barbados 1999 5 > Barbados 2000 5 > Barbados 2001 5.1 > Barbados 2002 4 > Barbados 2003 3 > Congo 1999 1 > Congo 2000 2 > Congo 2001 2 > Congo 2002 3 > Congo 2003 4 > Denmark 1999 11 > Denmark 2000 12 > Denmark 2001 13 > Denmark 2002 10 > Denmark 2003 10 > etc. .. .. > > How am I going to tell R that Albania is one same ID for all the 5 years I > have in the panel, in other words, Albania has to be identified by the same > number in the "factor" vector which R codes it with. Then Barbados is ID 2 in > all its years, Congo has ID 3 and so on. R already does that, factors are coded as integers: as.integer(dat$COUNTRY) # Albania is 1, etc > In STATA, you sort 'by country year' and the program knows it is a panel of > entities observed more than once over time. But I am not sure how to let R > know the same. > > In practice the reason why it is important to define where a country ends and > where a new begins is because > > 1) if one creates lags of variables and the program doesn't know where the > boundaries between countries are, the lag for the first year of Barbados in > my previous example will be calculated using the last year of Albania, that > is, the preceding country. A way of doing this, equivalent to the previous line of code if the countries are grouped consecutively, is cumsum(c(TRUE, dat$COUNTRY[-nrow(dat)] != dat$COUNTRY[-1L])) > > 2) I need to create countrydummies that take the value of 1 whenever a > country ID is equal to 1, so if Albania has 5 years of observations and each > of the year observations appears with a different ID, the country dummies > will not be created. Instead if Albania has the same country identifier (1) > for all the years in which it is observed, the country dummy will be the same > and ==1 whenever Albania is the country observed I doubt you need to create dummuies, R does it for you when you create a factor. internally, factors are coded as integers, so all you need is to coerce them to integer like I've said earlier. Rui Barradas > > Hope this makes it clearer, > Thanks, > Serena > > _____________________________________ > Sent from http://r.789695.n4.nabble.com > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.