Re: [Bioc-devel] Making hypothesis testing easier with design matrices?

Gordon K Smyth Mon, 10 Dec 2012 17:56:35 -0800

Dear Ryan,

Thanks for your suggestion. I think though that the attribute that youare thinking of implementing is not actually something that exists ingeneral.


On Mon, 10 Dec 2012, Ryan C. Thompson wrote:

Hi Gordon and list,
I've been thinking about how to make it easier to specify what hypotheses onewants to test in microarray or RNA-seq differential expression data sets, andI think one of the major stumbling blocks that confuses people is the way inwhich design matrices must have one coefficient "missing" for each term. Soif you have several experimental factors and blocking factors, you can't havea column of the design matrix corresponding to every level of every factor inthe formula.


I hope you understand why this has to be so.

But I believe every level of every factor could be represented as acontrast of one or more coefficients in the design matrix.


This is so only for one-way designs, i.e., for single factor experiments.

For example, if you had a variable "cond" with 3 levels "A", "B", and"C", and you did "model.matrix(~1+condition)",


I think you mean model.matrix(~cond)

you get a design matrix with an intercept, a B-A term, and a C-A term,with names "Intercept", "condB", and "condC".


Actually the name of the first coefficient would be "(Intercept)".

this, you could solve for the contrast representing each of A, B and C.For example, I believe A is "Intercept - (condB + condC)/3".

No. I'm not sure what you have in mind by "A" but, if you mean theaverage log-expression when the condition is at level A, then it would bejust "(Intercept)".


The contrast you give doesn't represent anything of interest.

Expressed as a contrast vector in R, this would be "c(1, -1/3, -1/3)".(Of course, for this trivial example one can just do"model.matrix(~0+cond)", but that doesn't work for all the factors in amulti-factor design.)
So, in the same step as the design matrix is created, the function could alsoreturn, regardless of how the model formula was parametrized, a matrix whereeach column is the contrast corresponding to one level of one of the factorsin the model formula.

No, there isn't any such thing as a contrast corresponding to one level ofone of the factors. For example, consider the additive model ~a+b where"a" and "b" are factors with levels 1,2,... Then it just doesn't makesense to talk about the effect for b1 as an isolated concept.

The model has fitted values, which require the levels of both factors tobe specified simultaneously, and it is possible to make tests forcombinations of coefficients. There isn't anything else that ismeaningful.

Suppose that "b" has three levels. You might not be appreciating the factthat the two coefficients b2 and b3 do contain all the information aboutcomparisons between levels of b. The intercept term is not involved.You can test for b2, for b3, or for b2-b3. In terms of pairwisecomparisons, there's nothing else.

(This could be added as an attribute on the design matrix, for example.)The user could then add and subtract these columns (perhaps with ahelper function similar to makeContrasts that allows it to be donesymbolically) to get the contrasts that they want without having toworry about exactly how the contrasts are coded into the design matrix.Obviously, for multi-factor designs, this matrix of factor levels codedas contrasts would have more columns than the design matrix itself. Forexample, if an experiment has a 2-level factor and a 3-level factor,then the design matrix would have 4 columns, but the "available factorlevel matrix" would have 5 columns.

Another thing to consider is interactions. For models includinginteractions, then making comparisons between levels of one factor inisolation is not scientifically sensible.

In the edgeR and limma User's Guide, I recommend that most users may findit simplest to think of interaction models in terms of one-way layouts.

For additive models however, I think there is no shortcut to a user tryingto understand what the fitted coefficients represent.


Best wishes
Gordon

The advantage of such a scheme would be that the computer can tell theuser in addition to the coefficients in the design matrix, "here are theavailable factor levels that you can perform comparisons on", and theuser could pick the ones they are interested in and and add/subtractthem to get the test they want.
What do you think of this idea? Could it work in practice for limma andedgeR? I would be interested in writing code to make it a reality if youthought it was worthwhile.
Sincerely,
-Ryan Thompson



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Making hypothesis testing easier with design matrices?

Reply via email to