Re: [racket] Neil: Is there a sensible way to sample from the naturals? EOM

Neil Toronto Thu, 20 Feb 2014 08:59:41 -0800

The connection between my final answer and sampling between 2^(X-1) and2^X is... mostly my intuition, but I think it can be made rigorous. Inoticed that the mean number of bits seemed to always be near 1/p, whichis the mean of a geometric distribution with parameter p. So I defined arandom variable whose log mean would be close to the same. It appears tohave the same characteristics: it looks like an "anything can happen"sort of distribution, and computing the mean never seems to converge, nomatter how many samples I put into it.

Sampling between 2^(X-1) and 2^X is just to make sure the samplingalgorithm will return any natural number. (Well, any that will fit inyour computer.) I can prove this distribution has no expected value.


The expected value of a discrete random variable is

  x_0 * p(x_0) + x_1 * p(x_1) + x_2 * p(x_2) + ...

where the x_i are the possible outcomes, and p(x_i) are theirprobabilities. It's a generalization of computing the mean of finitelymany uniform outcomes, in which p(x) = 1/n:


  (x_0 + x_1 + ... + x_{n-1}) / n
  = x_0 * 1/n + x_1 * 1/n + ... + x_{n-1} * 1/n

If X ~ Geometric(1/2), its expected value is

  0 * 1/2 + 1 * 1/4 + 2 * 1/8 + 3 * 1/16 + ...
  = 1/4 + 2/4 + 3/16 + ...
  = 1

Decision theory and economics are dominated by the idea that makingchoices to maximize expected values is the best thing to do. Butexpected values aren't stable, in the sense that applying a commonfunction to a random variable that has an expected value can yieldanother random variable that doesn't have one. If you exponentiate thosegeometric outcomes, you get this expected value:


  1 * 1/2 + 2 * 1/4 + 4 * 1/8 + 8 * 1/16 + ...
  = 1/2 + 1/2 + 1/2 + 1/2 + ...
  = +inf

If you use X ~ Geometric(1/3) and Y = 2^X, you get

  1 * 1/3 + 2 * 2/9 + 4 * 4/27 + ...
  = 1/3 + 4/9 + 16/27 + ...
  = +inf

for the expected value of Y. More generally, if X ~ Geometric(p), theformula for the expected value of Y is


  sum_{x=0}^inf 2^x * (1-p)^x * p
  = sum_{x=0}^inf (2*(1-p))^x * p

By the geometric series convergence test, this fails to converge when|2*(1-p)| >= 1, which happens when p <= 1/2. So if p <= 1/2, Y = 2^X hasno expected value, and we can use this fact to baffle economics studentsand impress the ladies.

The expected value of the random variable I gave (roughly, samplingbetween 2^(X-1) and 2^X) is bounded below by substituting the lowestpossible values (i.e. 2^(x-1) if x > 0, otherwise 0):


  0 * p(x_0) + 1 * p(x_1) + 2 * p(x_2) + 4 * p(x_3) + ...

The tail of this series starting at 1 is 1/2 * the tail of the seriesfor the expected value of Y starting at 1, which diverges.

A proof of divergence for the expected value of the final function Igave won't be as easy because the uniform numbers it chooses aren'tbounded below by a function that grows without bound (i.e. 2^(X-1)).It'll probably have something to do with the linearity of expectation.

Linearity, BTW, is what makes expected value popular. It often yieldsclosed-form solutions, and makes expected values scale and add togethereasily. Still, medians might be a better choice because they alwaysexist. In the Y = 2^X game, if X ~ Geometric(1/2), the median of Y'sdistribution is $1. If X ~ Geometric(1/4), it's $4, and if X ~Geometric(1/16), it's $1024.


Neil ⊥

On 02/19/2014 08:07 PM, Robby Findler wrote:

Thanks Neil! (I think probably Max and Burke are also interested.)

What is the connection between the economic philosophers argument
against maximizing expected value, the idea that we should sample
between 2^(X-1) and 2^X, and then the answer that you gave?

I think that I know that the geometric distribution is something you get
by repeatedly playing a game much like that first one, but I'm not
really sure of the details and unable to piece together the rest....

Robby



On Wed, Feb 19, 2014 at 4:02 PM, Neil Toronto <[email protected]
<mailto:[email protected]>> wrote:
 >
 > I've CC'd the Racket users mailing list because I thought more people
(e.g. Robby) would be interested in the answer.
 >
 > You can't sample uniformly from the naturals, but you can sample
according to an "anything can happen" sort of distribution. You want a
distribution with no expected value (aka mean, average).
 >
 > Economic philosophers use such distributions to disabuse people of
the idea that maximizing expected value is always the best way to make
decisions. Here's a simple example.
 >
 >  1. Flip a coin until you get heads. Call the number of flips X.
 >  2. You receive Y = 2^X dollars.
 >
 > You can flip a fair coin, or a coin biased toward tails. Which coin
do you choose? Obviously the biased coin. But in either case, the
expected value of Y is infinite. Voila! Paradox.
 >
 > (The problem isn't that there's no upper bound. For Y = X, the award
is still unbounded, but for any coin with nonzero heads probability, the
expected value is finite.)
 >
 > You want *every* natural to be possible, though, so we'd have to do
something like sample uniformly between 2^(X-1) and 2^X. Here's a
function that does it:
 >
 >   (: random-natural/no-mean (-> Real Natural))
 >   (define (random-natural/no-mean prob-zero)
 >     (define n (exact-floor (sample (geometric-dist prob-zero))))
 >     (define m1 (assert (expt 2 n) integer?))
 >     (define m0 (quotient m1 2))
 >     (max 0 (random-integer m0 m1)))
 >
 > The "max 0" keeps TR from complaining that `random-integer' returns
an Integer. The `prob-zero' argument is the probability the coin is
heads, which is also the probability this function returns 0.
 >
 > This looks sort of like what you want:
 >
 > > (random-natural/no-mean 0.001)
 > - : Integer [more precisely: Nonnegative-Integer]
 > 56136474695225172011728291802626216994256833545894766283873703181
 > 10364986394536406497817120521834403457182656624358136577815679690
 > 73469994282060833573766848756431669747238563269112899118707963866
 > 08154252648824183995287333693058951314331721341986222320438359883
 > 50861513517737507150144340359987088543453799423969409721165923104
 > 82128386772489312894482659745630141444108439622157113717027784284
 > 7612786588731040573293479397701924913229558559022675650838885440
 >
 > > (random-natural/no-mean 0.001)
 > - : Integer [more precisely: Nonnegative-Integer]
 > 57
 >
 > If it returns small numbers too often, you can mitigate that by
taking the max of a few of them.
 >
 > If you want a feel for the "average" output, you can put it in terms
of average number of bits:
 >
 > > (mean (build-list 1000 (λ _ (integer-length (random-natural/no-mean
0.01)))))
 > - : Real
 > 99 407/1000
 >
 > In fact, now that I think about it, this is probably nearly equivalent:
 >
 > (: random-natural/no-mean (-> Real Natural))
 > (define (random-natural/no-mean p)
 >   (random-bits (add1 (exact-floor (sample (geometric-dist p))))))
 >
 > The average number of bits should be near (/ 1 p).
 >
 > Neil ⊥
 >
 > ____________________
 >  Racket Users list:
 > http://lists.racket-lang.org/users


____________________
 Racket Users list:
 http://lists.racket-lang.org/users

Re: [racket] Neil: Is there a sensible way to sample from the naturals? EOM

Reply via email to