On Thu, Sep 13, 2012 at 7:35 PM, emorway <emor...@usgs.gov> wrote: > useRs, > > Here is some R-ready data for my question to follow. Of course this data is > small snippet from a much larger dataset that is about a decade long. > <snip data> > > Q_use<-data.frame(date=as.POSIXct(paste(Q[,1],"-",Q[,2],"-",Q[,3]," > ",floor(Q[,4]/60),":",Q[,4]-(floor(Q[,4]/60)*60),":00",sep=''),"%Y-%m-%d > %H:%M:%S",tz=""),Q=Q$Q) > SC_use<-data.frame(date=as.POSIXct(paste(SC[,1],"-",SC[,2],"-",SC[,3]," > ",floor(SC[,4]/60),":",SC[,4]-(floor(SC[,4]/60)*60),":00",sep=''),"%Y-%m-%d > %H:%M:%S",tz=""),SC=SC$SC) > > Using the data provided, I’m trying to calculate each day’s correlation > between Q_use$Q and SC_use$SC and store the values in a data.frame. An > example result I’d like to make is > > #Day 1 > cor(Q_use$Q[1:95],SC_use$SC[1:95]) > #[1] -0.4916499 > > #Day 2 > cor(Q_use$Q[96:191],SC_use$SC[96:191]) > #[1] -0.6085098 > > edm<-data.frame(Correl=t(t(c(cor(Q_use$Q[1:95],SC_use$SC[1:95]), > cor(Q_use$Q[96:191],SC_use$SC[96:191]))))) > > But of course I want R to figure out appropriate indexes (i.e. 1:95, 96:191, > and so in the larger dataset) for me. In other words, I'm seeking some help > with R code that will ‘pass’ through the two datasets calculating each day’s > correlation and doesn’t rely on the user supplying the ranges of indexes for > way the daily values reside. > > There are, as there always is, a couple of wrinkles. On day 3, for example, > > cor(Q_use$Q[192:287],SC_use$SC[192:287]) > [1] NA > > This is because SC_use$SC[275] = NA. Is there a way to direct R to continue > calculating that day's correlation using the data that is available for that > day? It is also necessary to check and make sure that > Q_use[i,1]==SC_use[i,1] for each i in that day because in the larger dataset > the row indices don’t necessarily match up (I have made sure that they do > for this simple example). It would be handy to know how many values were > missing on incomplete days, perhaps in a column appended to the resulting > data frame. I appreciate any R code that could help get me started toward > this end, I’m stuck. I tried looking at ?aggregate, had a look in the > reshape library, and ‘rollapply’ in the zoo library, but I wasn’t seeing a > way to do the error checking I just described. > Thanks, Eric > > Thanks for the reproducible example. This is pretty simple with xts: library(xts) xQ <- xts(Q_use["Q"], Q_use$date) xSC <- xts(SC_use["SC"], SC_use$date) x <- merge(xQ,xSC)
Now all the dates for both data sets are aligned in 'x', so you can use apply.daily() to run a function over each day: apply.daily(x, function(y) cor(y[,1],y[,2],use="pairwise.complete.obs")) [,1] 2002-03-28 23:45:00 -0.4916499 2002-03-29 23:45:00 -0.6085098 2002-03-30 23:45:00 -0.1489898 2002-03-31 00:00:00 NA Note that I had to create a small anonymous wrapper function so I could pass two objects to the cor() function. Hope that helps. > > > -- > View this message in context: > http://r.789695.n4.nabble.com/calculate-within-day-correlations-tp4643091.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.