Dear Gavin,

This is really great, thank you!  I created some long loops to get rid of
extra months at the beginning and the end of my data but your code is great
for putting it then together quarterly.
thanks again,
Denise

On Mon, 2008-08-18 at 14:31 +0100, Denise Xifara wrote:
> Thank you very much Stephen, but how will aggregate deal with months that
> fall outside annual quarters? eg, one extra month at the end of the
dataset?

[Without your data I'm kind of guessing at the exact format and problem,
but the example below shows one way to deal with quarters that span
years so should be adaptable to your problem. You also have a different
idea of quarters to my ecological one...]

## first some dummy data
set.seed(12345)
dat <- data.frame(price = cumsum(rnorm(450)),
                  date = seq(from = as.Date("2008-01-01"),
                             length = 450, by = "days"))

## get the months of observations
dat$month <- factor(format(dat$date, format = "%b"), levels = month.abb)

## and then format this for the quarters
dat$quarter <- character(length = NROW(dat))
## I'm sure these 4 steps can be simplified but just
## how escapes me at the moment
dat$quarter[dat$month %in% month.abb[c(12,1,2)]] <- "Winter"
dat$quarter[dat$month %in% month.abb[c(3:5)]] <- "Spring"
dat$quarter[dat$month %in% month.abb[c(6:8)]] <- "Summer"
dat$quarter[dat$month %in% month.abb[c(9:11)]] <- "Autumn"
dat$quarter <- factor(dat$quarter,
                      levels = c("Spring","Summer","Autumn","Winter"))

## look at the fruits of our labour
head(dat)

## create period
runs <- rle(as.numeric(dat$quarter))$lengths
dat$period <- factor(rep(seq_along(runs), times = runs))

## aggregate
with(dat, aggregate(price, list(quarter = quarter, period = period),
                    FUN = mean))

I use the rle() function (run length encoding) calculate the number of
observations where the 'quarter' remains the same:

> rle(as.numeric(dat$quarter))
Run Length Encoding
  lengths: int [1:6] 60 92 92 91 90 25
  values : num [1:6] 4 1 2 3 4 1

The 'values' here are the numeric representation of the quarter factor.
The most interesting for us is the second 4 - this is the winter 2008/9.
I use the lengths to replicate a period number (1,2,...,n) the correct
number of times. Now we have the period correctly calculated, we just
aggregate by quarter and period to give the averages you want.

If you are working on months 1-3 as quarter 1, 4-6 as quarter 2 etc,
then it is much easier, just aggregate by quarter and year:

## copy the data above
dat2 <- dat
## change meaning of quarter
dat2$quarter <- character(length = NROW(dat2))
dat2$quarter[dat2$month %in% month.abb[c(1:3)]] <- "Q1"
dat2$quarter[dat2$month %in% month.abb[c(4:6)]] <- "Q2"
dat2$quarter[dat2$month %in% month.abb[c(7:9)]] <- "Q3"
dat2$quarter[dat2$month %in% month.abb[c(10:12)]] <- "Q4"
dat2$quarter <- factor(dat2$quarter, levels = c("Q1","Q2","Q3","Q4"))
## year variable
dat2$year <- factor(format(dat2$date, format = "%Y"))

## drop the first 40 days to simulate a late starting record
## and aggregate
with(dat2[-(1:40), ], aggregate(price, list(quarter = quarter, year =
year), FUN = mean))

Which gives:
  quarter year        x
1      Q1 2008 13.58644
2      Q2 2008 24.16523
3      Q3 2008 28.56004
4      Q4 2008 32.60900
5      Q1 2009 44.86594

Do these examples help solve your problem?

G
>
> 2008/8/18 stephen sefick <[EMAIL 
> PROTECTED]<http://us.mc339.mail.yahoo.com/mc/[EMAIL PROTECTED]>
>
>
> > ?aggregate
> > may do what you want
> >
> > On Mon, Aug 18, 2008 at 8:19 AM, Denise Xifara
> > <[EMAIL PROTECTED]<http://us.mc339.mail.yahoo.com/mc/[EMAIL PROTECTED]>>
wrote:
> > > Dear R users,
> > >
> > > I have a dataframe where column is has countries, column 2 is dates
> > > (monthly) for each countrly, the next 10 columns are my factors where
I
> > have
> > > measurements for each country and  for each date.  I have attached a
> > sample
> > > of the data in csv format with the data for 3 countries.
> > >
> > > I would like to convert my monthly data into quarterly data, finding
the
> > > mean over 3 month periods for factors a-i, and the sum for factor j.
My
> > > problem is that not all countries have starting date at the beginning
of
> > a
> > > quarter for a particular year, ie some countries start in May or
> > September,
> > > and also some countries have one extra month, some have two extra
months
> > so
> > > there's no way of deleting some rows with a simple command (I want to
get
> > > rid of all extra data that does not fall into the quarters for each
> > > country), since the amount of data to get rid of for each country
varies.
> > >
> > > I tried for example:
> > > i=1
> > > denise<-data[((data$country)==unique(data$country[i]),]
> > > denise[,2]<- as.Date(denise$date, "%Y-%m-%d")
> > > denise2<-denise[order(denise[,2],decreasing=FALSE),]
> > > len<-length(denise[,1])
> > > limit<-floor(len/3)+1
> > > splitter<-rep(1:limit,each=3)
> > > spl.dat<-split(denise2,splitter)
> > > new.data<-as.matrix(lapply(spl.dat,FUN="mean"))
> > >
> > > This finds the mean every 3 rows but this doesnt consider the data
> > quarterly
> > > in a calendar sense.  ie if the data starts in november, it doesnt
> > discard
> > > the data for november, december and start calculating the means from
> > january
> > > onwards, until the month where the last quarter finishes, discarding
any
> > > extra month, or two months at the end.
> > >
> > > I tried converting my data frame/matrix to a time series but the dates
> > are
> > > not kept.  I got:
> > >
> > >>tser<-as.ts(denise)
> > > Warning message:
> > > In data.matrix(data) : class information lost from one or more columns
> > >
> > > and column 2 has become a list of numbers rather than dates.
> > >
> > > I tried:
> > >
> > >> library(fCalendar)
> > >> den.tseries<-as.timeSeries(denise)
> > > Warning messages:
> > > 1: In .whichFormat(charvec, ...) : Could not determine time(date)
format
> > > 2: In .whichFormat(charvec, ...) : Could not determine time(date)
format
> > >> is.timeSeries(den.tseries)
> > > [1] TRUE
> > >> apply.quarterly(den.tseries,FUN="mean")
> > >                   data
> > > 1970-01-01 -2.425000000
> > > 1970-04-01 -0.557961111
> > > 1970-04-28  0.009814815
> > >
> > > Here, it calculates things quarterly but the the as.timeSeries command
> > has
> > > assigned its own daily dates to the data, instead of keeping my
monthly
> > > dates.  Also, I don't understand how it deals with the extra dates.
> > >
> > > Sorry for the long email,
> > > Any help would be very much appreciated,
> > > Kind regards,
> > > Denise
> > >
> > > ______________________________________________
> > > R-help@r-project.org<http://us.mc339.mail.yahoo.com/mc/[EMAIL 
> > > PROTECTED]>mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
<http://www.r-project.org/posting-guide.html>
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> >
> > --
> > Let's not spend our time and resources thinking about things that are
> > so little or so large that all they really do for us is puff us up and
> > make us feel like gods. We are mammals, and have not exhausted the
> > annoying little problems of being mammals.
> >
> >        -K. Mullis
> >
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org<http://us.mc339.mail.yahoo.com/mc/[EMAIL 
> PROTECTED]>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to