Stock and Yogo (2005). dm. Daniel Malter wrote > > You find code to compute the Cragg-Donald Wald F test statistic to test > for weak instruments (e.g., Stock and Yogo, 2005) below with a working > example. The code reproduces the results from Stata 12 using the same data

It's not quite clear what you want to do. Is it just this? a<-c(1,2,3,4) dim(a)<-c(2,2) a/a This gives the element by element ratio. HTH, Daniel shukor wrote > > Hi, > > I have a matrix as below: > > mat= > [,1] [,2] [,3] > [1,]147 > [2,]258 > [3,]36

This is a matrix algebra problem, not an R problem. ej fails because betaligne and ZIcolone are not conformable matrices. dim(betaligne) 1 2 dim(ZIcolone) 76 1 Let a matrix be m x n, then you can only matrix multiply it with a matrix that is n x k (where m and k are arbitrary positive integers,

The Cragg-Donald Wald F statistic is the smallest of the eigenvalues of G (see the output of "eigen(G)" below). Critical values are found in Stock and Yogo (2005). HTH, Daniel Malter

As df2 has only one column and is thus effectively a variable in this case, you don't even need to merge. df1[df1$gene_name%in%df2$gene_name , ] should do. HTH, Daniel wong, honkit (Stephen) wrote > > Dear Experienced R users, > > I have a looks-like simple but complicated problem urgently

Hi, You can use rbind(). E.g., id<-rep(1:10,each=10) x<-rnorm(100) y<-rnorm(100) d<-data.frame(id,x,y) rm(id,x,y) newdata<-data.frame() for(i in 1:10){ newdata<-rbind(newdata,split(d,d$id)[[i]]) print(newdata) } The resulting data.frame "newdata" has rbind-ed all split elements of d (in this ca

The easiest way may be to use lmList in the nlme library: #simulate data d<-rep(1:10,each=10) x<-rnorm(100) e<-rnorm(100) y<-2*x+e require(nlme) #or install and load package lmList(y~x|d) #predicted values are obtained with: predict(lmList(y~x|d) HTH, Daniel jeff6868 wrote > > Hi everyb

The function you programed expects you to provide 7 arguments. In the first case, you explicitly specify each of the seven arguments, i.e., you tell the function: this is yu, this is yf, and so forth. In the second case, you only specify 2 arguments, t and par[1,1:16]. So the function thinks that p

Hi, In the example below, I am modeling the dependent variable Y as a function of X when the errors are autoregressive at the first lag. There is a number of ways/functions with which to model this. arima (tseries), gls (nlme), and gamm should produce similar results in the simulated example belo

Hi, I have a data frame whose first row (not the header) contains the true column names. The same column name can occur multiple times in the dataset. Columns with equal names are not adjacent, and for each observation only one of the equally named columns contains the actual data (see the exampl

This just saved me a lot of time. Thank you! Daniel

Assume your year value is x<-007/A You want to replace all non-numeric characters (i.e. letters and punctuation) and all zeros with nothing. gsub('[[:alpha:]]|[[:punct:]]|0','',x) Let's say you have a vector with both month and year values (you can separate them). Now we need to identify the c

#The code of rank 1 in the previous post should have read #rank1<-apply(iterator1,1,function(x) x+base1) #corrected code below siegel.tukey=function(x,y,id.col=TRUE,adjust.median=F,rnd=-1,alternative="two.sided",mu=0,paired=FALSE,exact=FALSE,correct=TRUE,,conf.level=0.95){ if(id.col=

The previously posted code contains bugs. The code below should work: x: a vector of data y: Group indicator (if id.col=TRUE); data of the second group (if id.col=FALSE). If y is the group indicator it MUST take 0 or 1 to indicate the groups, and x must contain the data for both groups. id.col:

Hi, I am running a MacBook Pro with OS X 10.6.8. When I try to load library(cem) and run any function, R crashes. Any suggestions as to why this is? Thanks in advance. I think the problem is that I don't have all the "infrastructure" I need to run CEM, but I have no clue what to do either...

Your post is unlikely to solicit a helpful response. As you do not adhere to the posting guide, it is quite impossible for us to figure out from your description what is going wrong. Please provide a self-contained example that reproduces the problem (i.e., code with simulated or actual data that w

The likelihood function is a product. Thus, the log likelihood function is a sum. Your log.lik statement, however, fails to compute the sum, which it should minimize. Hence your optim statement does not know what to optimize because log.lik is a vector of the length of the number of observations in

;-factor(column) contrasts(row)<-contr.treatment(levels(row)) contrasts(column)<-contr.treatment(levels(column)) # Works for Terps fit.terp<-glm(counts ~ row + column + row*column, family=poisson(link="log")) summary(fit.terp) HTH, Daniel Malter University of Maryland, College Park

There is a friedman.test() function. Any reason you want to do it by hand? If so, you can do: #Simulated data matrix x<-matrix(rnorm(9),3,3,byrow=T) x #Rank matrix r<-matrix(rank(x),dim(x)) HTH, Daniel JohnnyJames wrote: > > My data looks like this: > (treatmen

I find your question impossible to answer as you do not provide any description of what the matrix columns actually capture and how the variable in your proposed function relate to the columns of this matrix. Best, Daniel alaios wrote: > > Dear all > I have the following time stamps (in the fol

You access columns of a data.frame by column indices as in: X[ ,1], X[ ,2], etc. The index before the comma would stand for the row if you wanted to restrict those. The index after the comma captures the column. That said, you typically would not "extract" rows from the data frame but draw directl

I am not an expert on this, but there is a way to check this. You can predict from a gam using predict(ozonea, newdata=...). In the "newdata" argument you can specify the X-values of interest to you. Thus, you can compare if your predictions are the same when predicted directly from the gam or when

I think there must be an easier solution, but this works: y <- c(0,1,1,3,3,3,5,5,6) x<-matrix(0:6,ncol=1) apply(x,1,function(x){length(y[y==x])}) HTH, Daniel Kathie wrote: > > Dear R users, > > I'd like to count the number of integers in a vector y. > > Here is an example. > > y <- c(0,1,

What a pleasant post to respond to - with self-contained code. :) heat<-matrix(0,nrow=dim(xa)[1],ncol=dim(xa)[2]) heat[lower.tri(heat)]<-xa[lower.tri(xa)] heat[upper.tri(heat)]<-xb[upper.tri(xb)] diag(heat)<-1 heat HTH, Daniel 1Rnwb wrote: > > Hello Gurus > I have two correlation matrices 'x

You can get an estimate for the omitted baseline category by not estimating the intercept. To do that, type "-1" on the right-hand side of the regression statement. If that was your actual question, the simple question, "How can I omit the baseline in a regression?," would have sufficed. Moreover,

I need to quote David Winsemius on this one again: "The advancement of science would be safer if you knew what you were doing." Note that the whole model screams at you that it is wrongly modeled. You are running a fully interacted model with factor variables. Thus, you have 19 regressors plus the

You are modeling Condition * Stimulus * Group as fully interacted fixed effects. I do not actually think that you would need a complex random effect structure for this. A simple random effect for the individual might suffice. You could try to model this with lmer (in the lme4 library) and inspect w

You probably should not do panel data analysis but multiple time series analysis as your T is much larger than N. You only have seven units of observation but some 150 observations on each unit. Also, if the values on each unit of observation are very close to each other between t and t-1, then yo

I have not read the manual, but I drew 1 random normal vectors and 1 random Poisson vectors of length 1 and was unable to reproduce this behavior. Can you provide an example (self-contained code) that reproduces this problem? Thanks, Daniel Jeanne M. Spicer wrote: > > The summary fu

?plot ?points You will probably need to get some R basics down as to how to index certain subsets of your data. This you find in any introductory R manual. HTH, Daniel

And there I caught myself with the next blooper: it wasn't Ben Bolker, it was Bert Gunter who pointed that out. :) Daniel Malter wrote: > > Ben Bolker sent me a private email rightfully correcting me that was > factually wrong when I wrote that ML /is/ a numerical method (I

. Further, only numerical optimization will show the behavior discussed in this post for the given reasons. (I hope this post isn't yet another blooper of mine at 5 a.m. in the morning). Best, Daniel Daniel Malter wrote: > > With respect, your statement that R's optim does not giv

With respect, your statement that R's optim does not give you a reliable estimator is bogus. As pointed out before, this would depend on when optim believes it's good enough and stops optimizing. In particular if you stretch out x, then it is plausible that the likelihood function will become flat

df2<-melt(df1) df3<-cast(df2,Index~Name) df3 HTH, Daniel Dana Sevak wrote: > > I realize that this is terribly basic, but I just don't seem to see it at > this moment, so I would very much appreciate your help. > > > How shall I transform this dataframe: > >> df1 >   Name Index Value > 1   

On a methodological level, if the choices do not correspond on a cardinal or at least ordinal scale, you don't want to use correlations. Instead you should probably use Cramer's V, in particular if the choices are multinomial. Whether the wide format is necessary will depend on the format the funct

Your questions is pretty opaque. Please adhere to the posting guide. Provide a self-contained (!) example (i.e., code) that reproduces your problem. Generally, you would predict like this: x<-rnorm(100) e<-rnorm(100) y<-x+x^2+e reg<-gam(y~s(x)) plot(reg) predict(reg,newdata=data.frame(x=2)) wher

Please adhere to the posting guide (i.e., provide a sample of self contained code and provide the error message). And what does "but it is not working" mean? Is there an error code? rueu wrote: > > hello: > my data looks like: > time1  time2   event  catagoria > > 2004    2006        1         

Why would that be preferable to dropping the variables after importing the whole dataset? Daniel sassorauk wrote: > > Is it possible to import only certain variables from a SPSS file. > > I know that read.spss in the foreign library will bring the data into R > but can I choose to important on

x<-rnorm(25,1.5,1) dim(x)=c(5,5) x y<-ifelse(c(x)>1.5,1,0) dim(y)<-dim(x) HTH, Daniel Spartina wrote: > > Hello all, > > I'm having trouble creating an adjacency matrix. > > Basically, I need to turn the following distance matrix into an adjacency > matrix based on whether values are >1.5 o

Not satisfactory in which sense? The survreg(Surv(Value,Censoring)~indvars+strata(id)) should/may work. For a discussion of Tobit fixed effects, see also Greene's website: under "Fixed Efects and Bias Due to the Incidental Parameters Problem in

This suggests that this is a dangerous office to be in because this is a basic question. I am sure somebody in your office knows this. Anyway, the baseline gives you the average value of the group that constitutes the baseline when all other covariates are zero. Let's say you measure whether men or

You can do this using ifelse(). See example below. x<-rpois(100,100) NA.x<-sample(1:100,40) x[NA.x]=NA y<-rpois(100,100) NA.y<-sample(1:100,40) y[NA.y]=NA z<-ifelse(!,y,ifelse(!,x,NA)) HTH, Daniel holly shakya wrote: > > I have 2 columns for weight. There are NAs in each col

I have read it three times and still no concrete idea what you are actually trying to do, mainly because there is no information as to which level/variable you are aggregating on. It'd help if you provided the aggregated data (or sample rows thereof) so that we know what you want the result to be.

I am not a Bayesian. In the non-Bayesian case you would use SUR to model both equations simultaneously. If both use the exact same matrix of data, X (i.e., the value are numerically absolutely identical), then SUR will collapse to OLS. In that sense you get a "combined" estimate using SUR that resp

It will never hurt your chances to get an answer if you tell what the code in the different language actually does. Is it doing t-est for all of the variables and the group indicators are in a separate column? If, imagine you have the following data: id<-rep(c(1,2),each=100) x<-c(rnorm(100,0,1),rn

Would: tapply(mpg,list(year,manufacturer), function(x) length(unique(x))) do what you are trying to accomplish? Or do you need very specific subsets? If so, you can do: year<-c(rep(1980,4),rep(1981,4)) manufacturer<-rep(c('A','B'),4) mpg<-sample(c(1:2),8,replace=T) guc<-function(x,y,z){length(u

fit1, the unrestricted model, includes 1 more regressor than fit2, the restricted model. Testing the models against each other means that fit2 is "equal to" fit1, assuming that the coefficient on the additional regressor that is included in fit1 is restricted to zero. Your F-test is thus whether t

install.packages("bootstrap") library(bootstrap) ?

Please provide the data as self-contained code, as requested in the posting guide, so that helpers can directly paste it into R. Alternatively, you can provide the dilomaaethiops.txt. Best, Daniel Watching19 wrote: > > hello... I am trying to get this code to work, but as I get to the predict >

y is the dependent variable, not a predictor or independent variable. since this is a binomial model, y should be 0/1 or, atypically, a proportion. HTH, Daniel Samuel Okoye wrote: > > Dear all, > > I am using glm with quasibinomial. What does the following error message > mean: > > Error in e

Hi, I answered this question before in this post: , specifically in my message from May 11, 2010; 4:30pm. However, I believe the newer version of plm shows an R-squared, which should be the within R-squared. Why the pr

Take a look here: HTH, Da. andy1234 wrote: > > Dear everyone, > > I am new to R, and I am looking at doing text classification on a huge > collection of documents (>500,000) which are distributed among 300 classes > (so basically, this is my training dat

#Do apply(y,1,print) #Note the space that is inserted before the "1." If you insert this space in your function apply(y,1,function(x){x<-unlist(x); if (![2]) & x[2]=='2k' & ![1]) & x[1]==' 1') 1 else 0} ) #you get the result you expect. #Also, note that your ! conditions are

In your first line, you write "ARMA(2,2)." However, what you fit in R is ARIMA(2,1,2). What you fit in eview, I can't tell. Could that explain the difference? HTH, Daniel Young Gyu Park wrote: > > When I do ARMA(2,2) using one lag of LCPIH data > > > > This is eview result > >> >> *Dependen

This seems to work, as well:

They seem to have a workaround. I don't know whether anything better is available by now. HTH, Daniel

It will greatly help if you provide a self-contained example, as requested in the posting guide. Most of the times this will in fact lead to you figuring out your problem yourself, but if not it will greatly enhance your chances that we can help you in a meaningful, unambiguous way. Best, Daniel

x<-letters[1:3] y<-1:3 d<-expand.grid(x,y) g<-apply(d,1,function(x) paste(x[1],x[2],sep="")) HTH, Daniel Campbell, Desmond-2 wrote: > > Dear R-help readers, > > I'm sure this problem has been answered but I can't find the solution. > > I have two vectors > v1 <- c("a","b") > v2 <- c(1,2,3)

#Here is I think an easier way of coding all the components you need. #The within-group variances, you get with this function: apply(iris[,1:4],2,function(x) tapply(x,iris$Species,var)) #You can get what you computed by taking the column means. apply(apply(iris[,1:4],2,function(x) tapply(x,iri

Let's say your fixed vector is x, and y is the list of vectors that you want to create the squared distance to x with, then: x<-c(1:5) y<-list() y[[1]]<-sample(c(1:5),5) y[[2]]<-sample(c(1:5),5) y[[3]]<-sample(c(1:5),5) y distances<-lapply(y,function(a,b) crossprod(a-b), b=x) #lapply goes over

ame(id,obs,d,happy) Daniel Malter wrote: > > Hi all, > > I am statistically confused tonight. When the assumptions to a random > effects estimator are warranted, random effects should be the more > efficient estimator than the fixed effects estimator because it uses fewer >

Hi all, I am statistically confused tonight. When the assumptions to a random effects estimator are warranted, random effects should be the more efficient estimator than the fixed effects estimator because it uses fewer degrees of freedom (estimating just the variance parameter of the normal rathe


B is the specification for time-varying covariates. Otherwise, your model will think that each row is one independent observation that either had an event or was censored at "time" or "total_time." HTH, Daniel javier palacios wrote: > > Dear R-community, > > which of the following two format

The "problem" with your first solution is that it relies on that the each 'year x group' combination is present in both data frames. To avoid this, I would recommend to use merge() df3<-merge(df1,df2,by.x=c("Year","Group"),by.y=c("Year","Group")) df3$ratio<-with(df3,Value.x/Value.y) df3 HTH, Dani

It looks like it. However, you provide very little information. Do you have measurements before and after the intervention and did the intervention occur at the same point in time for all treated? If so, you could do a simple difference in differences estimation. HTH, Daniel Troy S wrote: > > D

I should have written "the standard errors of the coefficients are the SQUARE ROOT of the diagonal entries of the variance-covariance matrix," as I programmed it in the code. Daniel Malter wrote: > > Pick up a book or the like on ordinary least squares regression, which is &g

Pooling nominal with numeric variables and running pca on them sounds like conceptual nonsense to me. You use PCA to reduce the dimensionality of the data if the data are numeric. For categorical data analysis, you should use latent class analysis or something along those lines. The fact that your

Pick up a book or the like on ordinary least squares regression, which is what lm() in its plain vanilla application does. The t-value is the estimated coefficient divided by the standard error. The standard errors of the coefficients are the diagonal entries of the variance-covariance matrix. x<

look into the *apply series of functions. In your case apply(,2,min) or apply(,2,max) will do. You can also put any summary function to your liking instead of min/max. Best, Daniel Lao Meng wrote: > > Hi all: > If I have a dataframe of N column

if your vector of data is x, use x[x!=-]. Subseting entire data frames works analogously. HTH, Daniel

General suggestions: avoid cbind() and avoid accessing data frames. Convert data frames to matrices before accessing them. Also, why do you print? You don't really want to print(t) for every iteration of the loop, do you? Also avoid defining elements within the loop that need to be defined only onc

Q1 is very opaque because you are not even saying what kind of plot you want. For a regular scatterplot, you have multiple options. a.) select only the data in the given intervals and plot the data b.) plot the entire data, but restrict the graph region to the intervals you are interested in, or

I am afraid this is one of these posts where I have to quote David Winsemius: "The advancement of science would be safer if you knew what you were doing." Moreover, these are questions best addressed to your local statistician rather than the R-help list. With exceptions, the R-help list helps to s

I am not sure what purpose the while loop has. However, the main problem seems to be that you need to put: i<-sample(1:(n-40),1) #This sample from 1 to n-40 rather than i<-sample(1:n-40,1) #this samples one 1:n and then subtracts 40 Otherwise, you may get negative index values Best, Daniel

The kappa2() function in the irr library takes an n x 2 matrix as input, where the two columns are the ratings by two raters. Let x and y below be the ratings of the two raters: x<-sample(c(0,1,2),100,replace=T) x o<-sample(c(0,0,0,1),100,replace=T) y<-x+o y #Then kappa is computed as: kappa2(

o do that? May you look at my look. It is calculating > the estimates for one row. How do I incorporate the other data that has > all other columns? > > Thanks > > Ed > Sent from BlackBerry® wireless device > > -Original Message- > From: "Daniel Malte

If A has more columns than in your example, you could always try to only merge those columns of A with B that are relevant for the merging. You could then cbind the result of the merging back together with the rest of A as long as the merged data preserved the same order as in A. Alternatively, yo

This is much clearer. So here is what I think you want to do. In theory and practice: Theory: Check if AA[i] is in BB If AA[i] is in BB, then take the row where BB[j] == AA[i] and check whether A1 and A2 are in B1 to B3. Is that right? Only if both are, you want the indicator to take 1. Here i

This is not very confusing. It is the exact same error in the sense that this time the values of x1 are not only outside the interval (0-1) but within [0-1] as in your first example, but this time they are also outside [0-1]. The reason is that you did not divide x1 by sum(x1) this time. In other w

This is a theoretical issue. It is impossible for beta-distributed values to take the value of 0 or 1. Hence, an attempt to fit a beta distribution to a vector containing these values fails. HTH, Daniel baxy77 wrote: > > Hi, > > Well, i need some help, practical and theoretical. I am wonderi

Steven's solution is great, but it will only work if the rows are really duplicates. If the data frame contains another variable whose values vary, it will not work because then the rows are obviously unique. df<-data.frame(df,value=rnorm(11)) unique(df) You would then have to make a decision, w

For question (a), do: which(AA%in%BB) Question (b) is very ambiguous to me. It makes little sense for your example because all values of BB are in AA. Therefore I am wondering whether you meant in question (a) that you want to find all values in BB that are in AA. That's not the same thing. I am

Hi, I hope this is for private purposes. Otherwise, I may cite David Winsemius: "The advancement of science would be safer if you knew what you were doing." First, your regression command is inverted. You ought to regress SoloKills on range, not vice versa. abline(lm(graph~range)) #does the tric

b<-c("1","2","3","4","<1") grep('<',b) HTH, Daniel Ryan Utz-2 wrote: > > Hi all, > > I'm trying to identify a particular digit or value within a vector of > factors. Specifically, this is environmental data where in some cases the > minimum value reported is "<" a particular number (and I wan

My recommendation would be to not "subset out" the data, because you are introducing a potential source of error when binding the new data back together with the old data. Preferably, I would work on selecting subsets of the dataset using indices (as suggested in the previous post) and just do the

If you just want to apply the function over successive columns of a data frame use apply(, 2 , llik) Daniel EdBo wrote: > > Hi > > I have a code that calculate maximisation using optimx and it is working > just fine. I want to extend the code to run several colomns of R_j wh

Hi, The blunt answer is: by learning R. In particular, you will need pattern matching techniques as in ?grep and (somewhat advanced, some would call it basic) knowledge of R. So if you aren't familiar with either, I would suggest an introductory manual or one of the many websites you find online a

Check the *apply() series of functions. tapply() will do what you want. attach(mtcars) tapply(hp,list(cyl,gear),mean) HTH, Daniel Mark Alen wrote: > > I know commands like xtabs and table allow a user to do cross-tabulation > For example the following command generates a pivot table that sh

Typically not, unless the device has not closed properly. Does the second plot show up in the pdf? Daniel cherron wrote: > > Hello, > >>I am currently working on a script and I output plots to pdf using > > pdf(...) > plot(...) > > >>then later I was trying to plot something and w

For the first part, use the col and pch arguments: id<-rep(c(0,2),each=50) e<-rnorm(100) x<-rnorm(100) y<-id+x+e xyplot(y~x,groups=id,col=c(3,4),pch=c(12,13)) For the second part, I do not know what exactly mean by superimpose the mean level? Should the mean for each group be displayed as a horiz

aniel From: David Winsemius [] Sent: Wednesday, July 20, 2011 9:01 AM To: Daniel Malter Cc: Subject: Re: [R] loops and simulation On Jul 20, 2011, at 1:34 AM, Daniel Malter wrote: snipped > requests, except that

Yes, there are in Europe. And there are summer classes in the US, as well. And no, this list is not so much about helping beginners to learn R. For that, there is a myriad of online sources. Rather, this list is for people who have exhausted their ability to (elegantly) solve a problem. Also, it s

First, it would have helped if you had posted the actual results for us to see how far they are off (and, more specifically, by which factor). Second, given your epiphany, you will find that that's exactly what David (and others before him) said or suggested. It is not about standardizing a nomina

I dare the conjecture that if you had written the code, you would know how to do this. This suggests that you are asking us to do your homework, which is not the purpose of this list. A simple inclusion of the code in a for or while loop and storing the estimated parameters with the index of the it

Please read the posting guide (requires a self-contained example of code) and consult the help pages before posting. If you type ?predict.lm the help page clearly states that the argument 'newdata' takes "[a]n optional data frame in which to look for variables with which to predict..." x1<-rnorm(1

P1-tapply(P1,Experiment,mean)[Experiment] HTH, Daniel ronny wrote: > > Hi, > > I would like to center P1 and P2 of the following data frame by the factor > "Experiment", i.e. substruct from each value the average of its > experiment, and keep the original data structure, i.e. the experiment a

As long as you just want to display it, use print() GG<- c(1,2,3) print(summary(GG),str(GG)) Output: num [1:3] 1 2 3 Min. 1st Qu. MedianMean 3rd Qu.Max. 1.0 1.5 2.0 2.0 2.5 3.0 HTH, Daniel andrewH wrote: > > Using str() in a function. > > I am in th

Try TukeyHSD(anova). TukeyHSD is implemented in the base package, i.e., ready to use with the base installation of R. Also, the TukeyHSD for the interaction term should have a colon ":" not a "*". The "which" argument in TukeyHSD() must be identical to the name of the coefficient in the summary tab

Probably not the most elegant, but a workable solution. Assume you have a matrix x of dimensions 10 x 10. Assume further you want to calculate the mean for each successive block of two columns. One way to do this is to create a matrix that indicates the column numbers from/to which to apply the fun

Not that I know of, but the paper says that they are easy to compute. If you did, you could contribute the code. Best, Daniel David Hugh-Jones-3 wrote: > > Hi all, > > Is there any code to run fixed effects Tobit models in the style of Honore > (1992) in R? > (The original Honore article is he

  1   2   3   4   5   >