Li, and others, Michael McCarthy brings up a relevant point to this question at the beginning of his "Bayesian Methods of Ecology" (2007) book, discussing the differences between frequentist and Bayesian methodologies. Briefly, depending on the specific model and the power of your data, it is possible to be unable to reject <both> types of null hypotheses with a given data set. Whether or not you subscribe to the Bayesian philosophy, McCarthy provides a good example of the limits of this type of hypothesis testing. The full situation is quoted below:
"Consider an ecologist who surveys ponds in a city for frogs. On her first visit to a pond, she searches the edge and listens for frog calls over a 20-minute period. The southern brown tree frog (Litoria ewingii) is the most common species in her study area, but it is not found on this particular visit (Fig. 1.2). However, the researcher would not be particularly surprised that the species was not detected because she knows from experience that when surveying ponds, southern brown tree frogs are detected on only 80% of visits when they are in fact present. Given this information, what can she conclude about whether the southern brown tree frog is present at the site or not?" And his frequentist argument: "Null hypothesis significance testing The first statistical approach to answering the question is null hypothesis significance testing. The null hypothesis for this first case might be that the southern brown tree frog is absent from the site. The researcher then seeks to disprove the null hypothesis with the collection of data. The single piece of data in this case is that the frog was not detected. The researcher then asks: What is the probability of obtaining this result if the null hypothesis were true?¹? This probability is the p-value of the significance test. If the p-value is sufficiently small (conventionally if less than 0.05), it means that the data (or more extreme data) would be unlikely to occur if the null hypothesis is true. If the p-value is small, then we assume that the data are inconsistent with the null hypothesis, which is then rejected in favour of the alternative. In the case of the frog survey, the p-value is equal to 1.0. This is calculated as the probability that we would fail to record the frog (i.e. obtain the observed data) if it is absent (i.e. if the null hypothesis is true). The high p-value means that the researcher fails to reject the null hypothesis that the frog is absent. The other possible null hypothesis is that the frog is present at the site. In this case, the probability of obtaining the data is equal to 0.2 (one minus the probability of detecting the species if present) given that the null hypothesis is true. Thus, the p-value is 0.2, and using a conventional cut-off of 0.05, the researcher would have a non-significant result. The researcher would fail to reject the null hypothesis that the southern brown tree frog was present. It is surprising (to some people) that the two different null hypotheses can produce different results. The conclusion about whether the species is present or absent simply depends on which null hypothesis we choose. The source of this surprise is our failure to consider statistical power, which I will return to in Chapter 2. Another possible source of surprise is that the p-value does not necessarily provide a reliable indicator of the support for the null hypotheses. For example, the p-value is equal to 1.0 for the null hypothesis that the frog is absent. This is the largest possible p-value, but it is still not proof that the null hypothesis is true. If we continued to return to the same pond and failed to find the frog, the p-value would remain equal to 1.0, insensitive to the accumulation of evidence that the frog is absent. This apparent discrepancy occurs because frequentist methods in general and p-values in particular do not provide direct statements about the reliability of hypotheses (Berger and Sellke, 1987; Berger and Berry, 1988). They provide direct information about the frequency of occurrence of data, which only gives indirect support for or against the hypotheses. In this way, frequentist methods are only partially consistent with mathematical logic, being confined to statements about data but not directly about hypotheses (Berger and Sellke, 1987; Jaynes, 2003)." On 2/6/10 9:44 PM, "Li An" <[email protected]> wrote: > Dear Ecologers, > > In testing ecological models, we often use t-test as a way to compare > our model results with observed data. If they are close enough, we > obtain more confidence about our model. However, in most traditional > situations, we put "no difference" as the null and regarded it as the > default. This means that unless we find substantial evidence, we would > retain the null hypothesis. For instance, we can use this type of test > to examine if a drug has a noticeable effect. > > In our model performance situation (testing observed data = predicted > numbers from a model, assuming data independence), I argue that we > should keep the alternative hypothesis as the default, making every > effort to find substantial evidence to support the null hypothesis (if > unable, we retain the alternative hypothesis related to inequality > between the model predictions and the data). In this case, we can still > use the traditional test statistic such as z or p values, but interpret > the results differently. Rather than using the criterion of p > 0.05 (or > Z<1.96 or t < a big number) to retain the null hypothesis, we should use > a more strict standard--e.g., p > a much larger number (e.g., 0.9) or z > < a much smaller number (e.g.,0.125), to retain the null hypothesis > about equality between the model predictions and the data. This seems > mofrea philosophical issue. Does this make sense? > > Li Kevin Burls Ecology, Evolution, and Conservation Biology Program University of Nevada, Reno [email protected] ³Give me space and motion and I will give you a world.² Rene Descartes
