On Fri, Feb 23, 2018 at 1:48 PM, Robert Goldman <rpgold...@sift.info> wrote:
> I'm looking for some advice and particularly literature pointers for a > question about the Bayesian stance. I'm interested in what approaches are > suggested for handling the case where one's prior is qualitatively wrong. > > For example, imagine that I have chosen a normal distribution for a random > variable, and when the observations come back, they are bimodal. What does > the Bayesian philosophy say about cases like this? > Why did you "choose a normal distribution"? Did you really believe that it was *impossible* that your variables X_1, X_2, ... followed any non-normal distribution? If so, then you should attribute the apparent bimodality of the observations to a coincidence in the sample, because no other explanation is possible. It seems that you are not willing to do this -- you are willing to be convinced by a finite sample that the distribution is *not* normal. So apparently, your true, mental model must place some probability mass on non-normal distributions. In other words, for engineering purposes, you were apparently applying a prior that didn't actually match your true prior state of belief. (Of course, this is a common approximation, which is why applied Bayesians make use of *posterior predictive checks (PPC).* Applied Bayesians also tend to try multiple models and compare them, e.g., by *cross-validation*, which can detect that your original prior over unimodal distributions was misspecified because using it gets you poor predictions on held-out data, compared to a prior that allowed bimodal distributions.) If you want to be fully Bayesian, it would be better to use your true prior state of belief from the start. Here are some priors you might actually have: - A mixture of K Gaussians where K is unknown. (You suspect that K=1 but you're not sure. So you place a prior over K as well as the K means and variances.) - Generalizing that, a prior over compositional model structures that prefers simpler structures (as in the Automatic Statistician <https://www.automaticstatistician.com/> project). - A nonparametric family of densities (Adams et al. 2009 <https://arxiv.org/pdf/0912.4896.pdf>). - A universal (Kolmogorov) prior based on computable functions, as suggested by Solomonoff (see e.g. Hutter 2007 <https://arxiv.org/pdf/0709.1516.pdf>). In other words, your uncertainty about the prior is really part of the prior (hierarchical Bayes), so you should integrate it out. A convenient simplification is *empirical Bayes*, which instead maximizes over that additional uncertainty. In the mixture of K Gaussians case, empirical Bayes would select K to maximize p(K) ∫θ p(θ | K) p(x1, x2, ... | K, θ) dθ, where θ specifies the K means and variances, p(θ | K) is the prior for a given K, and p(K) is a hyper-prior that is sometimes omitted. So choosing K=1 gives your original unimodal model, but the data might drive you to select the bimodal model K=2 (even if p(K=2) << p(K=1)). ... as I understand it, is critical that my prior be independent of the > observations, so revising my prior before I compute the posterior isn't > kosher. Notice that empirical Bayes is in fact using the observations to help select the prior. You are not the only one to have misgivings about that. See for example this handout <https://www2.isye.gatech.edu/~brani/isyebayes/bank/handout8.pdf> from Brani Vidakovic: Empirical Bayes is an approach to inference in which the observations are used to select the prior, usually via the marginal distribution. Once the prior is specified, the inference proceeds in a standard Bayesian fashion. The use of data to estimate the prior in addition to subsequent use for the inference in empirical Bayes is criticized by subjectivists who consider the prior information exogenous to observations. The repeated use of data is also loaded with perils since it can underestimate modeling errors. Any data is going to be complacent with a model which used the same data to specify some of its features. Empirical Bayes is closely related to *Bayesian model selection *or *Bayesian model comparison*. I'm sure that there must be a literature on this in statistics and > philosophy, but I don't know how to find it. Maybe there's a jargon term > that I just don't know. The boldface terms above may be helpful? -cheers, jason
_______________________________________________ uai mailing list uai@ENGR.ORST.EDU https://secure.engr.oregonstate.edu/mailman/listinfo/uai