Hello all, here's a real-world example: I'm measuring a quantity (d) at five sites (site1 thru site5) on a silicon wafer. There is a clear site-dependence of the measured value. To find out if this is a measurement artifact I measured the wafer four times: twice in the normal position (posN), and twice rotated by 180 degrees (posR). My data looks like this (full, self-contained code at bottom). Note that sites with the same number correspond to the same physical location on the wafer (the rotation has already been taken into account here).
> head(x) d site pos 1 1383 1 N 2 1377 1 R 3 1388 1 R 4 1373 1 N 5 1386 2 N 6 1394 2 R > boxplot (d~pos+site) This boxplot (see code) already hints at a true site-dependence of the measured value (no artifact). OK, so let's do an ANOVA to make this more quantitative: > summary(lm(d ~ site*pos) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1378.000 3.078 447.672 < 2e-16 *** site2 11.500 4.353 2.642 0.02466 * site3 12.000 4.353 2.757 0.02025 * site4 17.000 4.353 3.905 0.00294 ** site5 1.000 4.353 0.230 0.82294 posR 4.500 4.353 1.034 0.32561 site2:posR -4.000 6.156 -0.650 0.53050 site3:posR -10.500 6.156 -1.706 0.11890 site4:posR -5.500 6.156 -0.893 0.39264 site5:posR -3.000 6.156 -0.487 0.63655 Now I think that I see the following: - The average of d at site1 in pos. N (first in alphabet) is 1378. - Average values for site2, 3, 4 (especially 4) in pos. N deviate significantly from pos. 1. For instance, values at site4 are on average 17 greater than at site1. - The average value at site5 does not differ significantly from site1. OK, that was the top part of the result table. Now the bottom part: - In reverse position(posR) the average of d at site1 is 4.5 bigger, but that's not significant. - The average of d at site3:posR is 10.5 smaller than something, but smaller than what? And why does this -10.5 deviation have a p-value of .1 (not significant) vs the .02 (significant) deviation of 11.5 (site2, top part)? Let's see if I can figure that out. Difference between posN and posR at site3 is not so big: > mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"]) [1] -6 Is this what makes it insignificant? Shuffling around the numbers until I get to -10.5: > mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])-(mean(d[site==1&pos=="R"])-mean(d[site==1&pos=="N"])) [1] -10.5 OK, one has to keep track of all the differences and stuff. So I think I have understood about 80% of this simple example. The reason I'm going after this so stubbornly is that I'm at the beginning of a DOE which will take several weeks of measuring and will end up being analyzed with a big ANOVA (two response and about six explanatory variables, some continuous, some factorial). Already in the DOE phase I want to understand what I will be doing with the data later (this is for a Six Sigma project in an industrial production environment, in case anybody wants to know). Thanks, robert Here's the full dataset: x <- structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L, 1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L, 1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("N", "R"), class = "factor")), .Names = c("d", "site", "pos"), row.names = c(NA, -20L), class = "data.frame") attach(x) head(x) boxplot (d~pos+site) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.