Re: [R] Multiple regressions with changing dependent variable and time span

arun Sat, 30 Nov 2013 20:14:57 -0800

Hi,
No problem.

In that case, each column will be a list.  For example if I take the first 
element of `lst2`
dW1 <- rollapply(lst2[[1]],width=32,FUN=function(z) {z1 <- as.data.frame(z); 
if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); 
durbinWatsonTest(l1,max.lag=3) } else rep(NA,4)},by.column=FALSE,align="right")


 tail(dW1[,1],1)
#[[1]]
#[1] -0.3602936  0.1975667 -0.1740797


You can store it by:
resdW1 <- do.call(cbind,lapply(seq_len(ncol(dW1)),function(i) 
do.call(rbind,dW1[,i]))[1:3])


Similarly, for more than one elements (using a subset of lst2- as it takes time)


lst3 <- lapply(lst2[1:2],function(x) rollapply(x,width=32,FUN=function(z) {z1 
<- as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
<-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1,max.lag=3) } else 
rep(NA,4)},by.column=FALSE,align="right"))

lst3New <- lapply(lst3,function(x) 
do.call(cbind,lapply(seq_len(ncol(x)),function(i) do.call(rbind,x[,i]))[1:3]))

lst3New <- lapply(lst3New, function(x) {colnames(x) <- 
paste0(rep(c("r","dw","p"),each=3),1:3); x})

A.K.

On Saturday, November 30, 2013 5:03 PM, nooldor <nool...@gmail.com> wrote:

Hey!


Yes,
only the D-W test takes so much time, did not check it yet

I checked results (estimates) with manually run regressions (in excel) and they 
are correct.


I only change the "width" to 31 and "each=123" to 124, cause it should be 
((154-31)+1) x 334 = 41416 matrix


with the lag in D-W test I was wondering how to have table when I use 
durbinWatsonTest(l1,3) - with three lags instead of default 1.

but I can manage it - just need to learn about functions used by you.


Any way: BIG THANK to you!


Best wishes,
T.S.





On 30 November 2013 21:12, arun <smartpink...@yahoo.com> wrote:

Hi,
>
>I was able to read the file after saving it as .csv.  It seems to work without 
>any errors.
>
>dat1<-read.csv("Book2.csv", header=T)
>###same as previous
>
>
>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>cbind(dat1[,c(1:3)],dat1[x]))
>lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>library(zoo)
>
>res1 <- do.call(rbind,lapply(lst2,function(x) 
>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); c(coef(l1), 
>pval=summary(l1)$coef[,4], rsquare=summary(l1)$r.squared) } else 
>rep(NA,9)},by.column=FALSE,align="right")))
>row.names(res1) <- rep(paste("r",1:334,sep="."),each=123)
> dim(res1)
>#[1] 41082     9
>
>#vif
> library(car)
>
>res2 <- do.call(rbind,lapply(lst2,function(x) 
>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); vif(l1) } else 
>rep(NA,3)},by.column=FALSE,align="right")))
>row.names(res2) <- rep(paste("r",1:334,sep="."),each=123)
>dim(res2)
>#[1] 41082     3
>
>#DW statistic:
> lst3 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
>as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 
><-lm(r~F.1+F.2+F.3,data=z1); durbinWatsonTest(l1) } else 
>rep(NA,4)},by.column=FALSE,align="right"))
> res3 <- do.call(rbind,lapply(lst3,function(x) x[,-4]))
>row.names(res3) <- rep(paste("r",1:334,sep="."),each=123)
> dim(res3)
>#[1] 41082     3
>##ncvTest()
>f4 <- function(meanmod, dta, varmod) {
>assign(".dta", dta, envir=.GlobalEnv)
>assign(".meanmod", meanmod, envir=.GlobalEnv)
>m1 <- lm(.meanmod, .dta)
>ans <- ncvTest(m1, varmod)
>remove(".dta", envir=.GlobalEnv)
>remove(".meanmod", envir=.GlobalEnv)
>ans
>}
>
> lst4 <- lapply(lst2,function(x) rollapply(x,width=32,FUN=function(z) {z1 <- 
>as.data.frame(z); if(!sum(!!rowSums(is.na(z1)))) {l1 <-f4(r~.,z1) } else 
>NA},by.column=FALSE,align="right"))
>names(lst4) <- paste("r",1:334,sep=".")
>length(lst4)
>#[1] 334
>
>
>###jarque.bera.test
>library(tseries)
>res5 <- do.call(rbind,lapply(lst2,function(x) 
>rollapply(x,width=32,FUN=function(z) {z1 <- as.data.frame(z); 
>if(!sum(!!rowSums(is.na(z1)))) {l1 <-lm(r~F.1+F.2+F.3,data=z1); resid <- 
>residuals(l1); unlist(jarque.bera.test(resid)[1:3]) } else 
>rep(NA,3)},by.column=FALSE,align="right")))
> dim(res5)
>#[1] 41082     3
>
>A.K.
>
>
>
>
>
>
>
>
>On Saturday, November 30, 2013 1:44 PM, nooldor <nool...@gmail.com> wrote:
>
>here is in .xlsx should be easy to open and eventually find&replace commas 
>according to you excel settings (or maybe it will do it automatically)
>
>
>
>
>
>
>On 30 November 2013 19:15, arun <smartpink...@yahoo.com> wrote:
>
>I tried that, but:
>>
>>
>>
>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>> str(dat1)
>>'data.frame':    154 obs. of  1 variable:
>>
>>Then I changed to:
>>dat1<-read.table("Book2.csv", head=T, sep="\t", dec=",")
>>> str(dat1)
>>'data.frame':    154 obs. of  661 variables:
>>Both of them are wrong as the number of variables should be 337.
>>A.K.
>>
>>
>>
>>
>>
>>
>>
>>On Saturday, November 30, 2013 12:53 PM, nooldor <nool...@gmail.com> wrote:
>>
>>Thank you,
>>
>>I got your reply. I am just testing your script. I will let you know how is 
>>it soon.
>>
>>.csv could be problematic as commas are used as dec separator (Eastern Europe 
>>excel settings) ... I read it in R with this:
>>dat1<-read.table("Book2.csv", head=T, sep=";", dec=",")
>>
>>Thank you very much !!!
>>
>>T.S.
>>
>>
>>
>>
>>On 30 November 2013 18:39, arun <smartpink...@yahoo.com> wrote:
>>
>>I couldn't read the "Book.csv" as the format is completely messed up.  
>>Anyway, I hope the solution works on your dataset.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Saturday, November 30, 2013 10:34 AM, nooldor <nool...@gmail.com> wrote:
>>>
>>>
>>>ok.
>>>
>>>
>>>> dat1<-read.table("Book2.csv", head=T, sep=";", dec=",") > colnames(dat1) 
>>>> <- c(paste("F",1:3,sep="."),paste("r",1:2,sep=".")) > lst1 <- 
>>>> lapply(paste("r",1:2,sep="."),function(x) cbind(dat1[,c(1:3)],dat1[x])) > 
>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} ) > 
>>>> sum(!!rowSums(is.na(lst2[[1]]))) [1] 57 > #[1] 40 > 
>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x)))) [1] 57  0 > #[1] 40 46
>>>in att you have the data file
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 30 November 2013 16:22, arun <smartpink...@yahoo.com> wrote:
>>>
>>>Hi,
>>>>The first point is not that clear.
>>>>
>>>>Could you show the expected results in this case?
>>>>
>>>>set.seed(432)
>>>>dat1 <- as.data.frame(matrix(sample(c(1:10,NA),154*5,replace=TRUE),ncol=5))
>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:2,sep="."))
>>>>lst1 <- lapply(paste("r",1:2,sep="."),function(x) 
>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>
>>>>
>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>> sum(!!rowSums(is.na(lst2[[1]])))
>>>>#[1] 40
>>>> sapply(lst2,function(x) sum(!!rowSums(is.na(x))))
>>>>#[1] 40 46
>>>>
>>>>
>>>>A.K.
>>>>
>>>>
>>>>
>>>>On Saturday, November 30, 2013 10:09 AM, nooldor <nool...@gmail.com> wrote:
>>>>
>>>>Hi,
>>>>
>>>>Thanks for reply!
>>>>
>>>>
>>>>Three things:
>>>>1.
>>>>I did not write that some of the data has more then 31 NA in the column and 
>>>>then it is not possible to run lm()
>>>>
>>>>Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :  0 
>>>>(non-NA) casesIn this case program should return "NA" symbol and go 
>>>>further, in the case when length of the observations is shorter then 31 
>>>>program should always return "NA" but go further .
>>>>
>>>>
>>>>
>>>>2. in your result matrix there are only 4 columns (for estimates of the 
>>>>coefficients), is it possible to put there 4 more columns with p-values and 
>>>>one column with R squared
>>>>
>>>>
>>>>3. basic statistical test for the regressions:
>>>>
>>>>inflation factors can be captured by:
>>>>res2 <- do.call(rbind,lapply(lst2,function(x) 
>>>>rollapply(x,width=32,FUN=function(z)
>>>>  vif(lm(r~ 
>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>
>>>>and DW statistic:
>>>>res3 <- do.call(rbind,lapply(lst2,function(x) 
>>>>rollapply(x,width=32,FUN=function(z)
>>>>  durbinWatsonTest(lm(r~ 
>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>
>>>>
>>>>3a)is that right?
>>>>
>>>>3b) how to do and have in user-friendly form durbinWatsonTest for more then 
>>>>1 lag?
>>>>
>>>>3c) how to apply: jarque.bera.test from library(tseries) and ncvTest from 
>>>>library(car) ???
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Pozdrowienia,
>>>>
>>>>Tomasz Schabek
>>>>
>>>>
>>>>On 30 November 2013 07:42, arun <smartpink...@yahoo.com> wrote:
>>>>
>>>>Hi,
>>>>>The link seems to be not working.  From the description, it looks like:
>>>>>set.seed(432)
>>>>>dat1 <- as.data.frame(matrix(sample(200,154*337,replace=TRUE),ncol=337))
>>>>> colnames(dat1) <- c(paste("F",1:3,sep="."),paste("r",1:334,sep="."))
>>>>>lst1 <- lapply(paste("r",1:334,sep="."),function(x) 
>>>>>cbind(dat1[,c(1:3)],dat1[x]))
>>>>>
>>>>> lst2 <- lapply(lst1,function(x) {colnames(x)[4] <-"r";x} )
>>>>>library(zoo)
>>>>>
>>>>>res <- do.call(rbind,lapply(lst2,function(x) 
>>>>>rollapply(x,width=32,FUN=function(z) coef(lm(r~ 
>>>>>F.1+F.2+F.3,data=as.data.frame(z))),by.column=FALSE,align="right")))
>>>>>
>>>>>row.names(res) <- rep(paste("r",1:334,sep="."),each=123)
>>>>> dim(res)
>>>>>#[1] 41082     4
>>>>>
>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[1:32,]) )
>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>#109.9168150  -0.1705361  -0.1028231   0.2027911
>>>>>coef(lm(r.1~F.1+F.2+F.3,data=dat1[2:33,]) )
>>>>>#(Intercept)         F.1         F.2         F.3
>>>>>#119.3718949  -0.1660709  -0.2059830   0.1338608
>>>>>res[1:2,]
>>>>>#    (Intercept)        F.1        F.2       F.3
>>>>>#r.1    109.9168 -0.1705361 -0.1028231 0.2027911
>>>>>#r.1    119.3719 -0.1660709 -0.2059830 0.1338608
>>>>>
>>>>>A.K.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Friday, November 29, 2013 6:43 PM, nooldor <nool...@gmail.com> wrote:
>>>>>Hi all!
>>>>>
>>>>>
>>>>>I am just starting my adventure with R, so excuse me naive questions.
>>>>>
>>>>>My data look like that:
>>>>>
>>>>><http://r.789695.n4.nabble.com/file/n4681391/data_descr_img.jpg>
>>>>>
>>>>>I have 3 independent variables (F.1, F.2 and F.3) and 334 other variables
>>>>>(r.1, r.2, ... r.334) - each one of these will be dependent variable in my
>>>>>regression.
>>>>>
>>>>>Total span of the time is 154 observations. But I would like to have 
>>>>>rolling
>>>>>window regression with length of 31 observations.
>>>>>
>>>>>I would like to run script like that:
>>>>>
>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>vif(lm(r.1~F.1+F.2+F.3, data=data))
>>>>>
>>>>>But for each of 334 (r.1 to r.334) dependent variables separately and with
>>>>>rolling-window of the length 31obs.
>>>>>
>>>>>Id est:
>>>>>summary(lm(r.1~F.1+F.2+F.3, data=data)) would be run 123 (154 total obs -
>>>>>31. for the first regression) times for rolling-fixed period of 31 obs.
>>>>>
>>>>>The next regression would be:
>>>>>summary(lm(r.2~F.1+F.2+F.3, data=data)) also 123 times ... and so on till
>>>>>summary(lm(r.334~F.1+F.2+F.3, data=data))
>>>>>
>>>>>It means it would be 123 x 334 regressions (=41082 regressions)
>>>>>
>>>>>I would like to save results (summary + vif test) of all those 41082
>>>>>regressions in one read-user-friendly file like this given by e.g command
>>>>>capture.output()
>>>>>
>>>>>Could you help with it?
>>>>>
>>>>>Regards,
>>>>>
>>>>>T.S.
>>>>>
>>>>>    [[alternative HTML version deleted]]
>>>>>
>>>>>______________________________________________
>>>>>R-help@r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide 
>>>>>http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>
>>>
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple regressions with changing dependent variable and time span

Reply via email to