There are various information criteria -- quoting from Wikipedia's article
on Model Selection <https://en.wikipedia.org/wiki/Model_selection#Criteria>:


   - Akaike information criterion
   <https://en.wikipedia.org/wiki/Akaike_information_criterion> (AIC), a
   measure of the goodness fit of an estimated statistical model
   - Bayes factor <https://en.wikipedia.org/wiki/Bayes_factor>
   - Bayesian information criterion
   <https://en.wikipedia.org/wiki/Bayesian_information_criterion> (BIC),
   also known as the Schwarz information criterion, a statistical criterion
   for model selection
   - Deviance information criterion
   <https://en.wikipedia.org/wiki/Deviance_information_criterion> (DIC),
   another Bayesian oriented model selection criterion
   - Focused information criterion
   <https://en.wikipedia.org/wiki/Focused_information_criterion> (FIC), a
   selection criterion sorting statistical models by their effectiveness for a
   given focus parameter
   - Hannan–Quinn information criterion
   <https://en.wikipedia.org/wiki/Hannan%E2%80%93Quinn_information_criterion>,
   an alternative to the Akaike and Bayesian criteria
   - Kashyap information criterion
   
<https://en.wikipedia.org/w/index.php?title=Kashyap_information_criterion&action=edit&redlink=1>
(KIC)
   is a powerful alternative to AIC and BIC, because KIC uses Fisher
   information matrix
   - Minimum description length
   <https://en.wikipedia.org/wiki/Minimum_description_length>
   - Minimum message length
   <https://en.wikipedia.org/wiki/Minimum_message_length> (MML)
   - Watanabe–Akaike information criterion
   <https://en.wikipedia.org/wiki/Watanabe%E2%80%93Akaike_information_criterion>
(WAIC),
   also called the widely applicable information criterion


They all purport to formalize Ockham's Razor and, therefore, prevent
overfitting, by quantifying the information that goes into the model as
well as quantifying the information that goes into the model's error.  The
sum of the two is a measure of information (e.g. "bits") used for model
selection:  the model selection criterion.  The two, taken together, are
also adequate to reproduce the original data -- if not without loss, then,
at least, without loss of whatever the particular criterion deems "noise".

So, what do I mean by "Information MetaCriterion"?

I mean to quantify the information that goes into the information criterion
so as to select the best information criterion.

My conjecture is that the information criterion based on the length of the
executable archive of the dataset is the one requiring the least
information.

The only open parameter of any "executable archive" is the underlying UTM.


What about the others?  How can I make such a bold claim?

Well, they all, have at least one other parameter:  What they deem as
"noise".

Why is this damning to them all?

Consider RSA cyphertext of the sequence:

1111111111111111111111111111111111111111111111111111111111....

This cyphertext will _appear_ to be noise to all of them _except_ the to
the one based on "the smallest executable archive of the dataset", which
will consist of just the RSA algorithm, the private key, the count of 1s
and a for loop of that count generating the 1s.

QED

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0fc0d7591fcf61c5-M9e330eb814a66ad721c95461
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to