On Fri, Feb 23, 2018 at 1:48 PM, Robert Goldman <rpgold...@sift.info> wrote:

> I'm looking for some advice and particularly literature pointers for a
> question about the Bayesian stance. I'm interested in what approaches are
> suggested for handling the case where one's prior is qualitatively wrong.
>
> For example, imagine that I have chosen a normal distribution for a random
> variable, and when the observations come back, they are bimodal. What does
> the Bayesian philosophy say about cases like this?
>

Why did you "choose a normal distribution"?
Did you really believe that it was *impossible* that your variables X_1,
X_2, ... followed any non-normal distribution?
If so, then you should attribute the apparent bimodality of the
observations to a coincidence in the sample, because no other explanation
is possible.

It seems that you are not willing to do this -- you are willing to be
convinced by a finite sample that the distribution is *not* normal.
So apparently, your true, mental model must place some probability mass on
non-normal distributions.

In other words, for engineering purposes, you were apparently applying a
prior that didn't actually match your true prior state of belief.
(Of course, this is a common approximation, which is why applied Bayesians
make use of *posterior predictive checks (PPC).*
Applied Bayesians also tend to try multiple models and compare them, e.g.,
by *cross-validation*, which can detect that your original prior over
unimodal distributions was misspecified because using it gets you poor
predictions on held-out data, compared to a prior that allowed bimodal
distributions.)

If you want to be fully Bayesian, it would be better to use your true prior
state of belief from the start.
Here are some priors you might actually have:

   - A mixture of K Gaussians where K is unknown.  (You suspect that K=1
   but you're not sure.  So you place a prior over K as well as the K means
   and variances.)
   - Generalizing that, a prior over compositional model structures that
   prefers simpler structures (as in the Automatic Statistician
   <https://www.automaticstatistician.com/> project).
   - A nonparametric family of densities (Adams et al. 2009
   <https://arxiv.org/pdf/0912.4896.pdf>).
   - A universal (Kolmogorov) prior based on computable functions, as
   suggested by Solomonoff (see e.g. Hutter 2007
   <https://arxiv.org/pdf/0709.1516.pdf>).

In other words, your uncertainty about the prior is really part of the
prior (hierarchical Bayes), so you should integrate it out.

A convenient simplification is *empirical Bayes*, which instead maximizes
over that additional uncertainty.
In the mixture of K Gaussians case, empirical Bayes would select K to
maximize p(K) ∫θ p(θ | K) p(x1, x2, ... | K, θ) dθ​​, where θ specifies the
K means and variances, p(θ | K) is the prior for a given K, and p(K) is a
hyper-prior that is sometimes omitted.
So choosing K=1 gives your original unimodal model, but the data might
drive you to select the bimodal model K=2 (even if p(K=2) << p(K=1)).

... as I understand it, is critical that my prior be independent of the
> observations, so revising my prior before I compute the posterior isn't
> kosher.


Notice that empirical Bayes is in fact using the observations to help
select the prior.
You are not the only one to have misgivings about that.  See for example this
handout <https://www2.isye.gatech.edu/~brani/isyebayes/bank/handout8.pdf> from
Brani Vidakovic:

Empirical Bayes is an approach to inference in which the observations are
used to select the prior, usually via the marginal distribution. Once the
prior is specified, the inference proceeds in a standard Bayesian fashion.
The use of data to estimate the prior in addition to subsequent use for the
inference in empirical Bayes is criticized by subjectivists who consider
the prior information exogenous to observations. The repeated use of data
is also loaded with perils since it can underestimate modeling errors. Any
data is going to be complacent with a model which used the same data to
specify some of its features.


Empirical Bayes is closely related to *Bayesian model selection *or *Bayesian
model comparison*.

I'm sure that there must be a literature on this in statistics and
> philosophy, but I don't know how to find it.  Maybe there's a jargon term
> that I just don't know.


The boldface terms above may be helpful?

-cheers, jason
_______________________________________________
uai mailing list
uai@ENGR.ORST.EDU
https://secure.engr.oregonstate.edu/mailman/listinfo/uai

Reply via email to