Crack-proofing the Scratch Lottery Code [long]

Ken Wesson Wed, 09 Feb 2011 03:21:32 -0800

It's likely that many of you have read, or at least encountered links
to, this Wired article recently:


http://www.wired.com/magazine/2011/01/ff_lottery/all/1

In it, a particular tic-tac-toe themed scratch lottery is examined in
detail. A mathematician, Mohan Srivastava, discovered weaknesses in it
and many similar lotteries. The article claims:

> Fundamentally, he believes that creating impregnable tickets is
> extremely difficult, if not impossible. “There is nothing random about
> the lottery,” he says. “In reality, everything about the game has been
> carefully designed to control payouts and entice the consumer.” Of
> course, these elaborate design elements mean that the ticket can be
> undesigned, that the algorithm can be reverse-engineered. The veneer of
> chance can be peeled away.

I had my doubts whether it was impossible to make it secure while
still controlling payouts so I took a stab at writing my own algorithm
to generate tic-tac-toe cards that follow the same rules as the ones
depicted in the Wired article.

Particularly, the player gets 24 hidden numbers, selected from the
numbers between 1 and 39 inclusive, and eight tic-tac-toe boards with
exposed numbers. No board has two of the same number; no board has
more than one winning row, column, or diagonal; and all of the rows,
columns, and diagonals on the board have the same prize values.

My thought was to use a two stage process.

First, generate the part of the ticket visible without scratching anything.

Then, generate the 24 hidden numbers, in such a way that the same
visible-part can be made a winner or a loser.

Of course it's a bit more complex than just that. The 8 visible
tic-tac-toe grids cannot be truly random without it being impossible
to get a desired statistical distribution of prizes. So the grid
generation has to be constrained somewhat. Eventually I came up with a
program that produces visible-parts that the user can, with some
difficulty, assort into five distinguishable species with three
subspecies each, but I tuned the hidden-number algorithm so that the
average payout for all combinations is roughly a dollar (with the
ticket price being three, and also the minimum non-zero prize, so the
house should make about two bucks a ticket on average, coming out in
the black). Therefore the user cannot game the distinguishable species
and subspecies to cherry-pick any subset of tickets with a
statistically significantly higher expected payout. In particular they
can expect to lose a couple of bucks on each ticket no matter what
numbers they see in the uncovered part of that ticket.

Here is a sample ticket generated with this program by calling (create-ticket):



[(3 4 6 11 13 14 15 16 17 18 19 20       ; hidden numbers
  21 24 27 28 29 30 32 33 34 35 37 39)

 (((37 33 10)
   (35 25 17)               ; upper left tic-tac-toe grid
   (26 15 3))

  ((10 3 37)
   (17 25 33)                          ; upper right grid
   (34 15 26))

  ((23 18 39)
   (14 24 5)                         ; second-to-top left
   (22 31 27))

  ((37 3 10)
   (33 25 17)
   (26 15 27))

  ((6 13 7)
   (20 1 2)
   (8 11 30))

  ((37 33 10)
   (29 25 17)
   (26 15 35))

  ((28 34 38)
   (36 19 30)                                ; lower left
   (13 9 12))

  ((6 18 7)
   (4 1 2)                                  ; lower right
   (8 11 34)))]



There are a few patterns that might leap out at more mathematically
inclined readers; certain of the number positions, such as the center
and upper corners, have only four possible values with two of them
more common than the other two. This is unavoidable but it is also the
case that these patterns cannot be used to guess the prize amount
without knowing the hidden numbers. In fact the most that can be
determined from them is whether it has about a 1 in 3 chance of being
a $3 winner and a 2 in 3 chance of losing, an almost certain loser
with a 1% chance of winning $100, or able to generate a variety of
prizes other than $3 and averaging $1 per ticket. The most serious
weakness here is that a thrill-seeking player can, in theory, zero in
on roughly half of the tickets as having no chance to generate the
big-money prizes ($250 and up) and avoid those. They can't, however,
game these tickets to alter substantially their expected loss of $2
per ticket bought.

The above has three discernible groups of grids: four with 25 in the
center, two with mostly smaller numbers, and two more. This indicates
that it's a species likely to produce 0, 1, or 2 winning grids and
very unlikely to produce more; the bottom left corner is one of the
odd ones and the top right isn't so it probably belongs to the
varied-prizes subspecies capable of generating the big $50,000 maximum
prize. The odds of a ticket like this one winning $50,000 are
minuscule, however.

Virtually "scratching" this ticket and matching up the numbers with
the hit-grids function reveals:



([[true  true  false]  ; X X .
  [true  false true ]  ; X . X
  [false true  true ]] ; . X X

 [[false true  true ]  ; . X X
  [true  false true ]  ; X . X
  [true  true  false]] ; X X .

 [[false true  true ]  ; . X X
  [true  true  false]  ; X X .
  [false false true ]] ; . . X

 [[true  true  false]  ; X X .
  [true  false true ]  ; X . X
  [false true  true ]] ; . X X

 [[true  true  false]  ; X X .
  [true  false false]  ; X . .
  [false true  true ]] ; . X X

 [[true  true  false]  ; X X .
  [true  false true ]  ; X . X
  [false true  true ]] ; . X X

 [[true  true  false]  ; X X .
  [false true  true ]  ; . X X
  [true  false false]] ; X . .

 [[true  true  false]  ; X X .
  [true  false false]  ; X . .
  [false true  true ]]); . X X



Like about 8 out of 9 of the tickets this code generates, ours is not a winner.



Below is the code that generated this, plus a small suite of simple
tests, and a detailed discussion of it all. To understand the code and
its discussion, it is recommended that you visit the link above and
familiarize yourself with the appearance of the tic-tac-toe tickets,
particularly the geometry of the potential winning lines and
corresponding prize amounts.

This section is fairly technical so you may want to skip to the
concluding remarks at the bottom of this post with control-end and
page-up now. You can also control-F for

sample test result

to get there quickly. That phrase occurs at the start of the
conclusion and nowhere between here and there. In Firefox and
Thunderbird, triple-clicking the isolated line above will select it
for quick copy/paste.



The first part of the code is a set of utility functions centered
around randomizing various things: making shuffled vectors from seqs,
uniformly picking a random element from a coll, picking randomly from
a set of alternatives with specified probability weightings, and
generating a boolean that has a 1 in N chance of being true. That last
is mainly used in ifs: (if (one-in N) (do-rare-thing)
(do-common-thing)); (one-in 2) generates a coin flip boolean and
(one-in nil) is constantly false, so nil can be used as "infinity"
here, e.g.

(if (one-in
      (cond
        (some-test)    1   ; always
        (another-test) 2   ; half the time
        (test-no-3)    100 ; rarely
        :otherwise))   nil ; never
  (do-something))

All of these use the rand-int function and none of the rest of the
code directly uses the RNG. These functions are thus the one area that
needs modifying to drop in another RNG implementation, which could be
done with a replacement for rand-int that calls a custom
implementation that, say, stores its state in an atom held in a global
var. This allows replacement of the default RNG with something like,
say, a Mersenne Twister or an entropy pool or even, hardware
permitting, a quantum source of genuine randomness.



(defn shuffle [ss]
  (let [s (count ss)]
    (loop [out (transient (vec ss)) n (dec s)]
      (if (zero? n)
        (persistent! out)
        (let [i (rand-int (inc n))
              t (out n)]
          (recur
            (assoc!
              (assoc! out n (out i))
              i t)
            (dec n)))))))

(defn rand-elt [s]
  (nth s (rand-int (count s))))

(defn rand-elt-fmap [fmap]
  (let [ftot (reduce + (vals fmap))
        n (rand-int ftot)]
    (loop [i 0 k nil fmap (seq fmap)]
      (if (< n i)
        k
        (let [f (first fmap)]
          (recur (+ i (val f)) (key f) (next fmap)))))))

(defn one-in [n]
  (and n (= (rand-int n) 0)))



The next part of the code generates "templates", which are
three-element vectors whose final element is the visible eight
tic-tac-toe grids. That's a list of eight lists of three lists of
three integers. The other two parts are used by the hidden number
generator. Importantly, the latter can make a winner or a loser out of
any ticket. The only thing different across the different ticket
"species" is the number of possibly-winning grids: one species will
fairly often have one possibly-winning grid and rarely have 7, another
sometimes 2 and rarely 6, another occasionally 3 and rarely 5, another
infrequently 4, and another rarely 8. These are successively rarer, as
well as having increasing probabilities of losing. Each has three
subspecies which are subtler to spot but produce more distinct prize
distribution differences -- one produces rare $100 wins, one common $3
wins, and one wins of various amounts except $3 at various
frequencies. All species and subspecies have about the same expected
total prize amount, though -- roughly $1.



(def t1
  [[:za :ca :xc]
   [:cb :xx :ys]
   [:yc :zb :cc]])

(def t2
  [[:xc :ca :za]
   [:ys :xx :cb]
   [:cc :zb :yc]])

(def rarities [t1 t2 t1 t1 t1 t1 t2 t1])

(defn create-template []
  (let [nums (shuffle (range 1 40))
        cs (subvec nums 0 15)
        xs (subvec nums 15 23)
        ys (subvec nums 23 31)
        zs (subvec nums 31)
        xx1s (subvec xs 0 2)
        xx2s (subvec xs 2 4)
        xc1s (subvec xs 4 6)
        xc2s (subvec xs 6)
        yc1s (subvec ys 0 2)
        yc2s (subvec ys 2 4)
        ys1s (subvec ys 4 6)
        ys2s (subvec ys 6)
        za1s (subvec zs 0 2)
        zb1s (subvec zs 2 4)
        za2s (subvec zs 4 6)
        zb2s (subvec zs 6)
        grid-nums (map #(+ (* 2 %1) %2)
                    (shuffle
                      (rand-elt-fmap
                        {[0 0 0 0 0 0 0 0] 1
                         [0 0 0 0 0 0 0 1] 100
                         [0 0 0 0 0 0 1 1] 50
                         [0 0 0 0 0 1 1 1] 20
                         [0 0 0 0 1 1 1 1] 5
                         [0 0 0 1 1 1 1 1] 20
                         [0 0 1 1 1 1 1 1] 50
                         [0 1 1 1 1 1 1 1] 100
                         [1 1 1 1 1 1 1 1] 1}))
                    (repeatedly 8 #(rand-int 2)))
        card (map
               (fn [grid num]
                 (let [twos (> num 1)
                       m (mod num 2)
                       xxs (if twos xx2s xx1s)
                       xcs (if twos xc2s xc1s)
                       ycs (if twos yc2s yc1s)
                       yss (if twos ys2s ys1s)
                       zas (if twos za2s za1s)
                       zbs (if twos zb2s zb1s)
                       cs (shuffle cs)
                       ca (cs 0)
                       cb (cs 1)
                       cc (cs 2)]
                   (map
                     (fn [row]
                       (map
                         (fn [cell]
                           (condp = cell
                             :ca ca
                             :cb cb
                             :cc cc
                             :za (zas m)
                             :zb (zbs m)
                             :xx (xxs m)
                             :xc (xcs m)
                             :yc (ycs m)
                             :ys (yss m)))
                         row))
                     grid)))
               rarities grid-nums)]
    [nums grid-nums card]))



The template generator takes a random permutation of 1 to 39 and
partitions it into 15 common numbers, which will be among the hidden
numbers on all tickets generated from that template, and 24 controlled
numbers, only 9 of which will appear on each ticket and which are
subdivided into three sets of 8, called xs, ys, and zs, which are each
further categorized in three ways: "ones" versus "twos" and "parity",
about which see below, and two possible positions within grids: for
the xs, center or corner, for the ys, side or corner, and for the zs,
a or b.

The t1 and t2 templates are grid templates with the numbers replaced
by categories: za and zb for the zs, xx (center) and xc (corner) for
the xs, and yc (corner) and ys (side) for the ys, as well as three
cells marked ca, cb, and cc which can get arbitrary common numbers.

These templates both have the property that every winning line must
cross either an x and a y, an x and a z, or a y and a z. Further, one
of the diagonals (which generate the most valuable prizes) requires
both xs and a y. This feature is used to control the prize
distribution at the coarsest level and to make it possible to make
both losing and winning tickets based on any template this function
can emit.

The rarities template is mostly t1s with two t2s. The t2 template is
simply a mirror image of the t1 template and the rarities template
arranges things so the four biggest prizes, the two $50,000s and the
two $1000s, are x-x-y diagonals. These diagonals will be made very
rare.

The template generator first shuffles 1 to 39 and partitions the
result; these numbers, in this sequence, will be passed to the
hidden-number generator as the first element of the template vector.
Then it generates the second part: a seq of 8 numbers between 0 and 3.
Each grid is assigned one of these numbers. It's actually a bitfield
of sorts: the high bit divides the grids into "ones" and "twos" and
the low bit gives some of them "even" parity and some "odd". Note that
the various types of xs, ys, and zs come in pairs; the parity is used
sometimes to prevent certain winning lines from occurring in
hidden-number generation. Depending on a grid's parity, only the
zeroth or only the first elements of those vectors make it into that
grid. Similarly, there are xs that only occur in "ones" and xs that
only occur in "twos". So, the 0 1 2 3 number of a grid determines
which of four sets of controlled numbers occur on that grid.

Further, the distribution of "ones" and "twos" is skewed so most cards
will have one of one type and seven of the other, fewer will have two
of the rarer type, fewer still will have three, fewer still will have
four of each, and only rarely will a card have all of the same type.
(These produce the five distinguishable species). As you might have
guessed, the hidden-number generator, by choosing from among the xs
and ys and zs appropriately, will make only the "ones" or only the
"twos" eligible to win, and will almost always choose the rarer of the
two. This means most winning cards have a single win, fewer have two,
and so on. The user might try to game that by picking a card with a
less skewed number distribution (or, if he's lucky enough, one with a
lot of non-occurring numbers) hoping for four or even eight winning
grids, but the hidden-number generator compensates by making more of
those cards pure losers than of cards that have the more skewed
distributions.

The card itself just uses random common numbers along with the
grid-number (0 to 3) determined xs, ys, and zs to fill in each of the
t1 or t2 templates.

The output vector is the shuffling of the range 1 to 39, the
grid-nums, and the exposed part of the card.



(defn ifilt [coll idx]
  (if idx [(nth coll idx)] coll))



This is a utility function used when generating the hidden numbers. It
takes a coll and an index and if index is nil, returns the coll,
otherwise returns a one-element vector of the coll's element at that
index. So, it can be used to selectively narrow a coll down to one
element or not. In practice, it gets used on two-element collections
(specific pairs of xs, ys, or zs) with either 0, 1 or nil (so, a
parity or nil) as idx to force parity 0, force parity 1, or allow both
parities. It plays a role in dealing with a complex set of possible
winning cards.



(defn pick-player-nums [[nums grid-nums _]]
  (let [cs (subvec nums 0 15)
        xs (subvec nums 15 23)
        ys (subvec nums 23 31)
        zs (subvec nums 31)
        xx1s (subvec xs 0 2)
        xx2s (subvec xs 2 4)
        xc1s (subvec xs 4 6)
        xc2s (subvec xs 6)
        yc1s (subvec ys 0 2)
        yc2s (subvec ys 2 4)
        ys1s (subvec ys 4 6)
        ys2s (subvec ys 6)
        x1s (concat xx1s xc1s)
        x2s (concat xx2s xc2s)
        y1s (concat yc1s ys1s)
        y2s (concat yc2s ys2s)
        z1s (subvec zs 0 4)
        za1s (subvec zs 0 2)
        zb1s (subvec zs 2 4)
        z2s (subvec zs 4)
        za2s (subvec zs 4 6)
        zb2s (subvec zs 6)
        num-freqs (frequencies (map #(quot % 2) grid-nums))
        rare-num (key (first (sort-by val num-freqs)))
        twos (= rare-num 1)
        cgood (num-freqs rare-num)
        lose-mod ([nil 4 10 14 18 nil nil nil 3] cgood)
        flip-prob ([nil 1000 100 5 1 nil nil nil nil] cgood)
        twos (if (one-in flip-prob) (not twos) twos)
        ; Corners that could win big prizes
        ct #(= (> % 1) twos)
        cf (constantly false)
        pr-corners (map #(%1 %2) [ct ct cf cf cf cf ct ct] grid-nums)
        t2-corners (set
                     (map #(quot % 2)
                       [(second grid-nums) (nth grid-nums 6)]))
        prcc (count (filter identity pr-corners))
        ; $50,000 possible?
        biggie (or (first pr-corners) (second pr-corners))
        ; Chance we're an xxy ticket
        xxy-one-in (condp = prcc
                     0 1000
                     1 (if biggie 10000 2000)
                     2 nil
                     3 nil
                     4 nil)
        xxy (one-in xxy-one-in)
        [xyfreq xzfreq yzfreq lose-freq] (if xxy
                                           [nil nil nil nil] ; N/A
                                           (condp = t2-corners
                                             #{(if twos 0 1)} [0 0 1 1]
                                             #{(if twos 1 0)} [0 0 1 130]
                                             #{0 1} [30 59 1 2]))
        ; What kind of ticket are we?
        ticket-type (if xxy
                      :xxy
                      (if (one-in (* lose-freq lose-mod))
                        :xyz
                        :loser))]
    (condp = ticket-type
      :loser (let [[xs ys zs] (rand-elt
                                [[x1s ys2s z2s]
                                 [x2s ys1s z1s]])
                   ws (take 9 (shuffle (concat xs ys zs)))]
               (sort (concat cs ws)))
      :xyz (let [[xxs xcs ycs yss zas zbs type]
                 (rand-elt-fmap
                   (if twos
                     {[xx2s xc2s yc2s ys2s za1s zb1s :xy] xyfreq
                      [xx2s xc2s yc1s ys1s za2s zb2s :xz] xzfreq
                      [xx1s xc1s yc2s ys2s za2s zb2s :yz] yzfreq}
                     {[xx1s xc1s yc1s ys1s za2s zb2s :xy] xyfreq
                      [xx1s xc1s yc2s ys2s za1s zb1s :xz] xzfreq
                      [xx2s xc2s yc1s ys1s za1s zb1s :yz] yzfreq}))
                 xs (concat xcs xxs)
                 ys (concat ycs yss)
                 zs (concat zas zbs)
                 xes [(first xcs) (second xxs)]
                 xos [(first xxs) (second xcs)]
                 xeos (rand-elt [xes xos])
                 zes [(first zas) (second zbs)]
                 zos [(first zbs) (second zas)]
                 zeos (rand-elt [zes zos])
                 [axs ays azs exs]
                 (condp = type
                   :xy (if (one-in 100)
                         [xeos yss zs ycs]
                         [xxs yss zs ycs])
                   :xz (let [[zas xcs xxs zbs]
                             (map ifilt
                               [zas xcs xxs zbs]
                               (rand-elt-fmap
                                 {[0 nil 1 0] 368
                                  [0 1 1 nil] 50
                                  [nil 0 1 0] 1
                                  [0 0 1 nil] 40
                                  [1 0 nil 0] 1
                                  [1 nil 0 0] 40
                                  [1 1 0 nil] 40
                                  [1 0 0 nil] 50
                                  [1 nil 0 1] 368
                                  [nil 1 0 1] 1
                                  [0 1 nil 1] 1
                                  [0 nil 1 1] 40}))]
                             [xcs ys zas (concat zbs xxs)])
                   :yz (if (one-in 100)
                         [xs ycs zeos yss]
                         [xs ycs zas yss]))
                 ws (take 9 (shuffle (concat axs ays azs exs)))]
             (sort (concat cs ws)))
      :xxy (let [[ycs xxs xcs azs] (if twos
                                     [yc2s xx2s xc2s z1s]
                                     [yc1s xx1s xc1s z2s])
                 ws (take 9 (shuffle (concat ycs xxs xcs azs)))]
             (sort (concat cs ws))))))



This behemoth actually picks the hidden 24 numbers of a ticket and is
the beating heart of the whole program.

Given the *same* template it can generate numbers that will combine
with the template to make a variety of winning tickets and losing
ones. Depending on which of the five "species" the template is, it
will make somewhat different distributions: every prize will be
possible, but the distribution of the *number* of prizes (up to 8)
will differ by species, clustering around 1, 2, 3, 4, and 8 with
lesser peaks at 7, 6, and 5 in the first three cases. The more
potential prizes, the more likely the ticket will be generated a pure
loser, to compensate.

There are also three "subspecies" of ticket, which do have discernibly
different prize distributions: one produces mainly losers and $3
winners, one losers and infrequent $100 winners, and one losers mixed
with all of the prize amounts except $3. Their average payouts are the
same, however, via tuned proportions of losers to winners for each
supspecies. Each species can come in all three of the subspecies.

The algorithm is very complex and took quite a bit of fine-tuning to
give good results, but it exploits the template structures t1 and t2
above. Recall that a winning line must pass through an x and a y, an x
and a z, a y and a z, or a particular two xs and y. Furthermore, each
combination produces its own prize value. There are two goals here: 1.
avoid generating any grid that has more than one tic-tac-toe; 2. make
the higher-value prizes correspondingly rarer.

The first choice made is whether to generate hidden numbers to make
the ticket a loser, an "xxy" ticket with the possibility of winning
the two biggest prizes, or an "xyz" ticket. The loser probability is
modified in several places to counterweight various factors, starting
with which species the ticket is (loser-mod near the top of the big
let).

Let's assume, for now, that it picks "ones" to be eligible for winning
rather than "twos". Deciding the probability of a potentially big
winning xxy ticket is done by counting how many of the four corner
grids (capable of yielding $1000 or more) are "ones". If more than one
is, then it won't generate an xxy ticket at all; we only give out at
most one $1000+ prize per entire ticket. Otherwise, the odds are
highest if no corner is a "one" (maximum xxy diagonal prize: $250) and
lowest if an upper corner is (maximum xxy diagonal prize: $50,000).

It also decides some things that will only matter if it generates an
xyz ticket. The trickiness here comes from the fact that the $3, $20,
and $100 prizes for the verticals are not mirrored on the right half
of the ticket but are in the same left-to-right order. Otherwise we
would just use t2 for all right-hand grids and t1 for all left-hand
grids at template generation and then forget about it. Instead, t2
differs in a key way from t1: a t2 "yz" winner will be a $100 winner,
whereas a t1 "yz" winner can be a $3 or $100 winner. So we make the
t2s as rare as possible (only the top right and bottom left corner
grids) and special-case them. We count how many are "ones" (again
assuming we picked "ones" to possibly win) and it's 0, 1, or both of
them. In the case where exactly one can win an xyz ticket will
probably be an xy ticket or an xz ticket, with a low likelihood of
being a yz ticket. That's half of all tickets. In the other cases we
generate only yz tickets; where neither t2 can win we do it frequently
and where both t2s can win we make it 300 times more likely the ticket
as a whole is a loser. This adds a potentially user-distinguishable
set of ticket subspecies, one of which will produce zero or more
prizes of $3, one of which will produce $100s, and one of which will
produce potentially any prize other than $3. This proved necessary to
get a nice distribution of prizes from random tickets, with $3 winners
commonest, but the rarity weighting is such that cherry-picking the
potential $100 winner subset won't change your expected prize amount,
nor will picking either of the other subspecies -- the $3 dominated
subspecies wins somewhat more frequently but with a correspondingly
lower average payout from the winners, since $3 is the minimum prize
value.

After that, the actual number picking occurs. After accounting for the
15 common numbers, exactly 9 of the 24 controlled numbers must be
doled out.

For a "loser" ticket, it takes the x2s, y1s, and z1s, or the x1s, y2s,
and z2s, but omits the corner ys and includes only the side ys. Every
grid gets either only the xs, or the zs and a side y; the xs case can
produce a five-X losing board and the zs-side-y case can produce a
six-X losing board with a hexagonal pattern. Thus, choosing the 9
hidden numbers from these two sets of ten will produce no-win tickets.

The "xxy" ticket is also easy to generate: assuming we have picked
"ones" to possibly win, we pick the corner y1s, center x1s, and corner
x1s as six of our set of potential hidden numbers. The z2s give is
four more. The "twos" will get only zs and lose while the "ones" can
win on the potentially major prize-winning diagonal.

That leaves the "xyz" ticket, or rather, three sub-cases: xy, xz, and yz.

The xy case is capable of producing $10 and $100 prizes on the six t1
grids and $3 and $10 prizes on the two t2 grids. The code first
decides if it's going to possibly win $100. If not (the usual case) it
combines the center x1s, all y1s, and all z2s. The "twos" again get
only zs and lose; the center xs and corner ys don't produce winning
lines without the corner xs, the corner ys and size ys don't produce
winning lines, and the center xs and side ys produce ten dollar
prizes. There are ten numbers if this set -- two xs, four ys, and four
zs -- of which one is randomly omitted. If it decides to allow $100
prizes, it takes two x1s with the property that the center xs and
corner xs are of opposite parity (there are two possible such subsets
of the x1s, of which it chooses one at random) instead of the center
x1s. Because of the parity selection, no single grid will get both the
center x and corner x Xed, and this prevents double-winner grids with
both the $10 and the $100 lines Xed. Instead, the side y may combine
with one OR the other to produce a prize on any given grid, but not
both. Also, the corner Y can never combine with both Xs in one grid to
potentially win the big xxy prizes of $250 or more.

The yz case is capable of producing $3 and $100 winners on the six t1
grids and only $100s on the two t2 grids. The code makes the vertical
line through yc and za common and the horizontal line through yc and
zb rare. It's handled similar to the xy case: the x2s are four of our
candidates, which make the "two" grids losers, and the y1s are another
four candidates. The side ys cannot form any winning lines. The
winning patterns are determined by the choice of zs. The two za1s make
the final two candidates if we're avoiding the bottom row (most of the
time); this wins $3 except on t2s. See above for how controlling the
distribution of yzs depending on whether neither, one, or both t2s can
win is used to prevent too many $100 prizes resulting from this. In
the case we're allowing the bottom row, we pick the za of one parity
and the zb of the other instead of both zas. The t1 grids with one
parity can win $3 and the others can win $100, but none can win both.
Again we have ten candidates so we discard one randomly.

The xz case is the most complex, and this despite the fact that the
same prize amounts are possible for t1s and t2s. The thing is, they
can be $5, $20, or $250 winners, and there's no simple way to pick
nine of the numbers to prevent a double win.

Obviously we inclide the y2s for all those loser grids out there. The
trick is picking five of the eight combined x1s and z1s that can never
generate a double win, and whose prize distribution is as desired: $5
much commoner and $250 passing rare. That is what the magic box of [0
nil 1 0] type stuff is for. We have corner and center xs and zas and
zbs and pick five of them: two of one of those and one each of the
others, with parities chosen such that at most one prize is possible
for a grid of each parity. The various vectors of 0, 1, and nil
exhaust the possible choices that don't allow double winners. The two
with relative weighted frequencies of 368 give $5 winners on one
parity and losers on the other. The two weighted 50 give $20 on one
and lose on the other; the four weighted 40 give $20 on one and $5 on
the other; and the four weighted 1 give $250 on one and either $5 or
$20 on the other.

Since there are five combined x1s and y1s picked, this case generates
an exact set of nine hidden numbers, instead of picking ten and
discarding one.

All three cases have a chance to generate losers; the other two from
the choice of random number to omit and this one from the fact that
only one parity is sometimes capable of winning, halving the number of
winning grids on the average ticket.



(defn create-ticket []
  (let [template (create-template)
        player-nums (pick-player-nums template)]
    [player-nums (template 2)]))



This actually creates a ticket, by first creating a template and then
generating the 24 hidden numbers and emitting a vector with the full
seq of 24 hidden numbers followed by the grids generated as part of
the template.

This is the end of the ticket generator. The rest of the code below is
for testing it.



(def win-3 [[true false false]
            [true false false]
            [true false false]])

(def win-5 [[true true true]
            [false false false]
            [false false false]])

(def win-10 [[false false false]
             [true true true]
             [false false false]])

(def win-20 [[false true false]
             [false true false]
             [false true false]])

(def win-100a [[false false true]
               [false false true]
               [false false true]])

(def win-100b [[false false false]
               [false false false]
               [true true true]])

(def win-biga [[false false true]
               [false true false]
               [true false false]])

(def win-bigb [[true false false]
               [false true false]
               [false false true]])

(def winners [win-3 win-5 win-10 win-20 win-100a win-100b win-biga win-bigb])



The ten winning combinations, as grids of booleans. Next come helper
functions for testing a "hit grid" of booleans (true for your Xs)
against these patterns.



(defn combine
  ([hit-grid-seq]
    (reduce combine [[false false false][false false false][false false false]]
      hit-grid-seq))
  ([hit-grid-a hit-grid-b]
    (map
      (fn [row-a row-b]
        (map #(or %1 %2) row-a row-b))
      hit-grid-a hit-grid-b)))

(def errors (for [a winners b winners :when (not= a b)] (combine a b)))

(defn has-all? [test-grid hit-grid]
  (=
    (count (filter identity (apply concat test-grid)))
    (count
      (filter identity
        (mapcat
          (fn [row-a row-b]
            (map #(and %1 %2) row-a row-b))
          hit-grid test-grid)))))



That last determines if a test grid (e.g. win-3) matches a hit-grid by
seeing if all of the trues in the test grid are true in the hit grid.
The combine function works almost the same way, except it uses or
instead of and to combine booleans and is arity-overloaded. The errors
seq is constructed with combine from every distinct pair of winners.
(The combine function's overload actually ended up unused as a result
of some minor refactoring.)

It is thus the case that:

If a hit grid matches any of the errors it represents a
multiple-winner, which shouldn't happen.

If a hit grid matches none of the errors but one of the winners it
represents a win of that prize. Most of the prizes are numerical
amounts, but the two diagonals depend on the grid's position on the
card.

If a hit grid matches none of the winners, then it's a loser. Obviously.



(defn hit-grids [[player-nums card]]
  (let [player-nums (set player-nums)]
    (map
      (fn [grid]
        (vec
          (map
            (fn [row]
              (vec
                (map #(contains? player-nums %) row)))
            grid)))
      card)))



This "scratches" a ticket and generates a seq of 8 hit-grids with
trues corresponding to the numbers the player got.



(defn prizes [ticket]
  (let [player-nums (set (first ticket))
        hits (hit-grids ticket)]
    (sort
      (filter #(not= 0 %)
        (map
          (fn [hit-grid biga bigb]
            (if (some #(has-all? % hit-grid) errors)
              :error
              (condp has-all? hit-grid
                win-3 3
                win-5 5
                win-10 10
                win-20 20
                win-100a 100
                win-100b 100
                win-biga biga
                win-bigb bigb
                0)))
          hits
          [50000 250 250 250 250 250 250 1000]
          [250 50000 250 250 250 250 1000 250])))))



This computes a seq of the prizes won on a card, from zero to eight of
them. The condp tests each hit-grid generated from the card with
has-all? (has-all? takes the test-grid first specifically to work
neatly with condp). First it's checked for errors though, which
produce a :error keyword in the output if encountered. (The #(not= 0
%) instead of (complement zero?) in the filter is not for brevity but
to avoid a ClassCastException if this happens.) After that, the condp
checks it against each winner and emits a prize amount, defaulting to
0. This is mapped over the eight hit-grids, along with two vectors
that specify the biga and bigb prize amounts (which you'll recall are
position-dependent).

The following functions use prizes to compute other interesting facts
about a ticket.



(defn largest-prize [ticket]
  (if-let [p (last (prizes ticket))]
    p
    0))

(defn num-prizes [ticket]
  (count (prizes ticket)))

(defn prize [ticket]
  (reduce + (prizes ticket)))



These are used to implement these statistical checks, which compute
various distributions over the total ticket population.



(defn prizes-distribution [n-tickets]
  (sort-by val
    (frequencies (map prizes (repeatedly n-tickets #(create-ticket))))))

(defn prize-distribution [n-tickets]
  (sort-by key
    (frequencies (map prize (repeatedly n-tickets #(create-ticket))))))

(defn largest-prize-distribution [n-tickets]
  (sort-by key
    (frequencies (map largest-prize (repeatedly n-tickets #(create-ticket))))))

(defn num-prizes-distribution [n-tickets]
  (sort-by key
    (frequencies (map num-prizes (repeatedly n-tickets #(create-ticket))))))



Next comes a function that computes the house's take, assuming the
real-world ticket price of $3 visible in the photos in the Wired
article:



(defn house-take-avg [n-tickets]
  (double
    (- 3
      (/
        (reduce + (map prize (repeatedly n-tickets #(create-ticket))))
        n-tickets))))



The final few functions implement tests that the various
user-distinguishable species and subspecies of cards generate
comparable expected winnings, so that the user cannot game the lottery
by preferring one over another. Most crucial is for none to actually
offer an expected prize amount exceeding the ticket cost, which would
open the lottery to being outright robbed; the closer the averages are
to one another (and also if rarer ticket types are actually less
desirable) the less useful it becomes for money laundering.

My tests generate fairly variable results, but they seem centered near
$1 average winnings per ticket (or a net loss of two dollars for the
player) for all species and subspecies. Whether specific combinations
of species and subspecies produce stronger correlations is not tested
here, but that seems unlikely.



(defn ttype [grid-nums]
  (let [num-freqs (frequencies (map #(quot % 2) grid-nums))
        rare-num (key (first (sort-by val num-freqs)))
        cgood (num-freqs rare-num)]
    ([nil 1 2 3 4 nil nil nil 5] cgood)))

(defn ttype2 [grid-nums]
  (let [qnums (map #(quot % 2) grid-nums)
        num-freqs (frequencies qnums)
        rare-num (key (first (sort-by val num-freqs)))]
    (+
      (if (= (second qnums) rare-num) 1 0)
      (if (= (nth qnums 6) rare-num) 1 0))))

(defn average-prize-test [n-tickets]
  (let [templates (repeatedly n-tickets #(create-template))
        pnum-seqs (map pick-player-nums templates)
        grid-num-groups (map second templates)
        tickets (map #(vector %1 (nth %2 2)) pnum-seqs templates)
        prizes (map prize tickets)
        ; remove outliers
        prizes (map #(if (> % 999) 0 %) prizes)
        ttypes (map ttype grid-num-groups)
        ttps (map hash-map ttypes prizes)
        ttns (map #(hash-map % 1.0) ttypes)
        ttpmap (apply merge-with + ttps)
        ttnmap (apply merge-with + ttns)]
    (merge-with / ttpmap ttnmap)))

(defn average-prize-test-2 [n-tickets]
  (let [templates (map
                    (fn [tt]
                      (first
                        (drop-while
                          #(not= (ttype (second %)) tt)
                          (repeatedly #(create-template)))))
                    [1 2 3 4 5])]
    (map
      (fn [t]
        (let [pnum-seqs (repeatedly n-tickets #(pick-player-nums t))
              card (nth t 2)
              tickets (map #(vector % card) pnum-seqs)
              prizes (map prize tickets)
              ; remove outliers
              prizes (map #(if (> % 999) 0 %) prizes)]
          (double (/ (reduce + prizes) n-tickets))))
      templates)))

(defn average-prize-test-3 [n-tickets]
  (let [templates (map
                    (fn [tt]
                      (first
                        (drop-while
                          #(not= (ttype2 (second %)) tt)
                          (repeatedly #(create-template)))))
                    [0 1 2])]
    (map
      (fn [t]
        (let [pnum-seqs (repeatedly n-tickets #(pick-player-nums t))
              card (nth t 2)
              tickets (map #(vector % card) pnum-seqs)
              prizes (map prize tickets)
              ; remove outliers
              prizes (map #(if (> % 999) 0 %) prizes)]
          (double (/ (reduce + prizes) n-tickets))))
      templates)))



Here's a sample test result, showing a typical prize distribution on
1000 tickets:



user=> (prize-distribution 1000)
([0 883]
 [3 83]
 [5 10]
 [6 9]
 [9 2]
 [10 8]
 [20 2]
 [30 1]
 [100 1]
 [250 1])



Roughly 1/9 of tickets win. Most of those win $3, a few win $5, and
only a few in every thousand win more than $10. The 6s and 9s are from
tickets with two $3-winning grids and three $3-winning grids,
respectively, and the 30 was a single ticket with three $10-winning
grids.



The only alternative way to do all this that would be wildly different
from something like the above that has occurred to me is a rejection
algorithm that generates random tic-tac-toe grids and random sets of
24 hidden numbers -- so, fully random tickets -- and then rejects
winners with probabilities depending on the total prize amount. This
has the advantage of giving exact control over the distribution of
prize amounts and not just probabilistic -- you can make one million
tickets with exactly specified numbers of each possible kind of winner
-- and two giant disadvantages that stopped me even attempting it: 1.
it would be ludicrously slow, and probably slow specifically
generating the quotas of losers and *small* winners; and 2. the cards
that survived the rejection process would probably have obvious
patterns that would let the user make inferences about the likely
prize amounts! In particular, losing grids would have most of their
numbers drawn from the 15 possibilities that aren't in the hidden set
and winning grids would contain rows and columns with rare numbers.
Losing tickets would mostly lack grids with concentrations of rare
numbers. The losing ticket at the top of this post, by contrast,
contains several grids with groups of rare numbers and nonetheless has
no prize. Most tickets my code generates will look statistically
similar, from losers to $50,000 winners and in between.

My code also does NOT necessarily have the disadvantage of imprecise
prize control (though the law of large numbers should protect any
lotto company that decides to use it as is). One could generate one
million templates with (create-template) and then apply modified
versions of pick-player-nums to subsets of these templates to generate
specific numbers of tickets constrained to lose or to generate various
probable prize amounts. Or one could process (repeatedly
#(create-template)) through functions that pick various target prize
amounts, try to generate hidden numbers to produce that prize amount,
and emit a ticket or nil after the rejection step. Then you could make
a set of tickets with a specific distribution of prizes using several
variations on (take N (map try-to-make-certain-ticket-type (repeatedly
#(create-template)))). Or just tweak the distribution of the existing
(create-ticket) with a rejection algorithm that fine tunes the
results. The only bias in the surviving *non-hidden ticket parts* will
be to maybe shift the exact statistical distributions of the
discernible species and subspecies. If those are still a concern, you
could also use code based on average-prize-test-2 and
average-prize-test-3 to generate streams of tickets of each
combination of species and subspecies and, separately, apply rejection
algorithms to each to tweak their distributions in order to force the
expectation value of every combo to exactly equal $1, or any other
amount. This should make it impossible for the user to game the
lottery and also provide the lottery company with an exactly
controllable profit margin.



This code is somewhat rough, as it's been slapped together over a few
hours. It is not going to set any speed records. It's free of overt
bugs, but the statistics haven't been checked especially rigorously.
It does seem to be cryptographically much stronger than any of the
lottery algorithms that Srivastava broke, while generating a drop-in
replacement for one of the lotteries he cracked.

Feel free to hack on this, modify it, tweak it, add more tests, or
completely rewrite it. I think it is an interesting problem, as well
as strong evidence that it IS possible to generate scratch-and-win
tickets with no correlations between pre-scratching-visible features
and expectation value of total prize amount, since it's *demonstrably*
possible to make the correlations quite small and, with suitable
tweaking of the various magic numbers in the algorithm above,
arbitrarily small.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Crack-proofing the Scratch Lottery Code [long]

Reply via email to