Hi, I am afraid you misunderstood it. I do not have repeated records, but for every record I have two, possibly different, simultaneously present, instanciations of an explanatory variable.
My data is as follows : yield haplo1 haplo2 100 A B 151 B A 212 A A So I have one effect (haplo), but two copies of each affect "yield". If I use lm() I get: > a=data.frame(yield=c(100,151,212),haplo1=c("A","B","A"),haplo2=c("B","A","A")) Call: lm(formula = yield ~ -1 + haplo1 + haplo2, data = a) Coefficients: haploA haploB haplo2B 212 151 -112 But I get different coefficients for the two "A"s (in fact oe was set to 0) and the Two "Bs" . That is, the model has four unknowns but in my example I have just two! A least-squares solution is simple to do by hand: X=matrix(c(1,1,1,1,2,0),ncol=2) #the incidence matrix > X [,1] [,2] [1,] 1 1 [2,] 1 2 [3,] 1 0 > solve(crossprod(X,X),crossprod(X,a$yield)) [,1] [1,] 184.8333 [2,] -30.5000 where [1,] is the solution for A and [2,] is the solution for B This is not difficult to do by hand, but it is for a simple case and I miss all the machinery in lm() Thank you Andres On Wed, Mar 19, 2008 at 6:57 PM, Michael Dewey <[EMAIL PROTECTED]> wrote: > At 09:11 18/03/2008, Andres Legarra wrote: > >Dear all, > >I have a data set (QTL detection) where I have two cols of factors in > >the data frame that correspond logically (in my model) to the same > >factor. In fact these are haplotype classes. > >Another real-life example would be family gas consumption as a > >function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by > >family). > > Unless I completely misunderstand this it looks like you have the > dataset in wide format when you really wanted it in long format (to > use the terminology of ?reshape). Then you would fit a model allowing > for the clustering by family. > > > > > >An artificial example follows: > >set.seed(1234) > >L3 <- LETTERS[1:3] > >(d <- data.frame( y=rnorm(10), fac=sample(L3, 10, > >repl=TRUE),fac1=sample(L3,10,repl=T))) > > > > lm(y ~ fac+fac1,data=d) > > > >and I get: > > > >Coefficients: > >(Intercept) facB facC fac1B fac1C > > 0.3612 -0.9359 -0.2004 -2.1376 -0.5438 > > > >However, to respect my model, I need to constrain effects in fac and > >fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are > >logically just 4 unknowns (average,A,B,C). > >With continuous covariates one might do y ~ I(cov1+cov2), but this is > >not the case. > > > >Is there any trick to do that? > >Thanks, > > > >Andres Legarra > >INRA-SAGA > >Toulouse, France > > Michael Dewey > http://www.aghmed.fsnet.co.uk > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.