[R] importing spreadsheet data - linera regression - panel data

Millo Giovanni Mon, 20 Apr 2009 08:34:55 -0700

Dear Cecilia,

just adding some examples to Stefan's post, which says everything
already. I've recently gone mad with reshaping, so I assume it is a
little tricky. Or maybe what I tell you is obvious, then just skip it.


**import**

Your files are spreadsheets, so the best way to import is to save them
in a .csv or maybe in a .txt file (File>Save as>[choose 'Tab delimited'
format]) and then do as Stefan said with read.csv() or, if you saved in
tab delimited, read.table(..., sep="\t"). See help("read.table") or
?read.table which is the same.

Specifically, looks like you already exported them as tab delimited. So
something like 

yourdata <- read.table(file="yourfile.txt", sep="\t", header=TRUE)
should work.

**reshape**

Now 'yourdata' is a data.frame object. Then you reshape() them, as said,
from 'wide' to 'long' format.
You need to do something like

yourreshapeddata <- reshape(yourdata, direction="long",
varying=list(paste("revenue", 2007:1998, sep="")))

i.e., 'varying' must be the list of the vectors (here: only one vector!)
of column names corresponding to the time-varying variables to be
stacked. (Hint: see what 'paste("revenue", 2007:1998, sep=""))' does).

An example on the Grunfeld data just to clarify:
data(Grunfeld, package="Ecdat") # data start in 'long' format, the one
you need
## make them 'wide'
pippo<-reshape(Grunfeld, direction="wide", timevar="year", idvar="firm")

## see what you got
fix(pippo) 
## now make them 'long' again (what you need!)
pluto<-reshape(pippo, direction="long",
varying=list(paste("inv",1935:1954,sep="."),paste("value",1935:1954,sep=
"."),paste("capital",1935:1954,sep=".")))
## see how it worked. Notice the "id" and "time" variables it produced.
fix(pluto)

(Nevermind the names, Pippo and Pluto are Goofy and his dog in Italian.
Stands for "Foo" etc.)

**merge**

You say, you have one variable per spreadsheet/txt-file. Then as soon as
you have as many 'long' dataframes, let's say they are called 'revenue',
'cost', etc., you might want to merge() them two by two to make your
final dataframe, along the lines of

yourfinaldata <- merge("revenue", "cost", by=c("id", "time"))

**modeling**

Now your data are ready to do, say,

yourmodel <- lm(revenue~cost, data=yourfinaldata)

Notice that if you want to use 'plm' for panel data models, you will
want to have "id" and "time" in the first two columns, or alternatively
to specify an 'index' (see ?plm in library(plm)). Most "panel" models
can also be very effectively estimated with the 'nlme' or 'lme4'
packages, although the syntax is slightly more complicated.

Just to be sure, I am adding a small reproducible example on fake data
that should resemble what you have to do:

## begin example ##
## make fake 'wide' data for variable 'revenue'
pippo1<-matrix(rnorm(12),ncol=4)
pippo1<-as.data.frame(pippo1)
pippo1[,1]<-1:3
dimnames(pippo1)[[2]]<-c("firm",paste("revenue",2002:2000, sep=""))
## reshape
pippo1<-reshape(pippo1, direction="long", drop="firm",
varying=list(paste("revenue",2002:2000,sep="")))
## ...and have a look:
print(pippo1)

## make fake 'wide' data for variable 'cost'
pippo2<-matrix(rnorm(12),ncol=4)
pippo2<-as.data.frame(pippo2)
pippo2[,1]<-1:3
dimnames(pippo2)[[2]]<-c("firm",paste("cost",2002:2000, sep=""))
## reshape
pippo2<-reshape(pippo2, direction="long", drop="firm",
varying=list(paste("cost",2002:2000,sep="")))

## merge them (R is smart enough to guess the names of 'by' variables)
alldata<-merge(pippo1, pippo2)
## see what happened:
print(alldata)

## fix var names
dimnames(alldata)[[2]][3:4]<-c("revenue","cost")

## estimate a linear model
yourmod<-lm(cost~revenue, data=alldata)
summary(yourmod)

## to do panel models (with 'plm'):
## e.g., a fixed effects model
yourFEmod<-plm(cost~revenue, data=alldata, index=c("id","time"))
## end example ##



Original message:
------------------------------
Date: Sun, 19 Apr 2009 12:30:10 +0100
From: "Cecilia Carmo" <cecilia.ca...@ua.pt>
Subject: [R] importing spreadsheet data - linera regression - panel
        data
To: r-help@r-project.org
Message-ID: <web-74601...@controller2.cgpmail.ua.pt>
Content-Type: text/plain;charset=iso-8859-1;format="flowed"

Hi everyone and thank you for the help you could give me.

My data is in a spreadsheet. The 1st column identifies the 
firm (with the fiscal number), the columns 2 to 11 have 
the variable value for 11 years. I have many variables 
(files like this). Each file has about 40.000 firms 
(rows). I transformed all the files in txt files. The data 
is a panel data, like this:
firm    revenu2007      revenue2006     revenue2005     revenue2004
revenue2003     revenue2002     revenue2001     revenue2000
revenue1999     revenue1998             
500100144

504394029               4282809 3769159 3520807 3548322 3458122

503264032

502011475               2595780 2417433 2299563 2060552 1804531 1821638
1789533 1463371 947712
500400911

504615947               22801   28656   27067   26182   26356   34060
39147           
502616695               1412354 1209619 1429755 1623496 1955123 2273486
2087406 2076868 2036937
500829993               1383396 1095570 805830  793809  777591  791614
779924  774560  702845
The objective of my work is to do linear regressions with 
the variables in the files or with other variables that I 
can obtain from those by doing some mathematical 
operations.  I?ve already tried to import this information 
to an array in R, but I?ve seen that the linear 
regressions functions need the data into a dataframe. So 
I?m asking: How to import this information to a dataframe 
in R, in a manner that I can easily do the mathematical 
operations between the variables and then use it to do the 
regressions?

Cec?lia Carmo

**********
I hope it helps. Please if you use 'plm' let me know how it fares with
40.000x10 data points. Sure 'nlme' and 'lme4' can handle this.

Best wishes,
Giovanni

Giovanni Millo
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4, 
34132 Trieste (Italy)
tel. +39 040 671184 
fax  +39 040 671160

Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:13}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] importing spreadsheet data - linera regression - panel data

Reply via email to