It's likely that many of you have read, or at least encountered links to, this Wired article recently:
http://www.wired.com/magazine/2011/01/ff_lottery/all/1 In it, a particular tic-tac-toe themed scratch lottery is examined in detail. A mathematician, Mohan Srivastava, discovered weaknesses in it and many similar lotteries. The article claims: > Fundamentally, he believes that creating impregnable tickets is > extremely difficult, if not impossible. “There is nothing random about > the lottery,” he says. “In reality, everything about the game has been > carefully designed to control payouts and entice the consumer.” Of > course, these elaborate design elements mean that the ticket can be > undesigned, that the algorithm can be reverse-engineered. The veneer of > chance can be peeled away. I had my doubts whether it was impossible to make it secure while still controlling payouts so I took a stab at writing my own algorithm to generate tic-tac-toe cards that follow the same rules as the ones depicted in the Wired article. Particularly, the player gets 24 hidden numbers, selected from the numbers between 1 and 39 inclusive, and eight tic-tac-toe boards with exposed numbers. No board has two of the same number; no board has more than one winning row, column, or diagonal; and all of the rows, columns, and diagonals on the board have the same prize values. My thought was to use a two stage process. First, generate the part of the ticket visible without scratching anything. Then, generate the 24 hidden numbers, in such a way that the same visible-part can be made a winner or a loser. Of course it's a bit more complex than just that. The 8 visible tic-tac-toe grids cannot be truly random without it being impossible to get a desired statistical distribution of prizes. So the grid generation has to be constrained somewhat. Eventually I came up with a program that produces visible-parts that the user can, with some difficulty, assort into five distinguishable species with three subspecies each, but I tuned the hidden-number algorithm so that the average payout for all combinations is roughly a dollar (with the ticket price being three, and also the minimum non-zero prize, so the house should make about two bucks a ticket on average, coming out in the black). Therefore the user cannot game the distinguishable species and subspecies to cherry-pick any subset of tickets with a statistically significantly higher expected payout. In particular they can expect to lose a couple of bucks on each ticket no matter what numbers they see in the uncovered part of that ticket. Here is a sample ticket generated with this program by calling (create-ticket): [(3 4 6 11 13 14 15 16 17 18 19 20 ; hidden numbers 21 24 27 28 29 30 32 33 34 35 37 39) (((37 33 10) (35 25 17) ; upper left tic-tac-toe grid (26 15 3)) ((10 3 37) (17 25 33) ; upper right grid (34 15 26)) ((23 18 39) (14 24 5) ; second-to-top left (22 31 27)) ((37 3 10) (33 25 17) (26 15 27)) ((6 13 7) (20 1 2) (8 11 30)) ((37 33 10) (29 25 17) (26 15 35)) ((28 34 38) (36 19 30) ; lower left (13 9 12)) ((6 18 7) (4 1 2) ; lower right (8 11 34)))] There are a few patterns that might leap out at more mathematically inclined readers; certain of the number positions, such as the center and upper corners, have only four possible values with two of them more common than the other two. This is unavoidable but it is also the case that these patterns cannot be used to guess the prize amount without knowing the hidden numbers. In fact the most that can be determined from them is whether it has about a 1 in 3 chance of being a $3 winner and a 2 in 3 chance of losing, an almost certain loser with a 1% chance of winning $100, or able to generate a variety of prizes other than $3 and averaging $1 per ticket. The most serious weakness here is that a thrill-seeking player can, in theory, zero in on roughly half of the tickets as having no chance to generate the big-money prizes ($250 and up) and avoid those. They can't, however, game these tickets to alter substantially their expected loss of $2 per ticket bought. The above has three discernible groups of grids: four with 25 in the center, two with mostly smaller numbers, and two more. This indicates that it's a species likely to produce 0, 1, or 2 winning grids and very unlikely to produce more; the bottom left corner is one of the odd ones and the top right isn't so it probably belongs to the varied-prizes subspecies capable of generating the big $50,000 maximum prize. The odds of a ticket like this one winning $50,000 are minuscule, however. Virtually "scratching" this ticket and matching up the numbers with the hit-grids function reveals: ([[true true false] ; X X . [true false true ] ; X . X [false true true ]] ; . X X [[false true true ] ; . X X [true false true ] ; X . X [true true false]] ; X X . [[false true true ] ; . X X [true true false] ; X X . [false false true ]] ; . . X [[true true false] ; X X . [true false true ] ; X . X [false true true ]] ; . X X [[true true false] ; X X . [true false false] ; X . . [false true true ]] ; . X X [[true true false] ; X X . [true false true ] ; X . X [false true true ]] ; . X X [[true true false] ; X X . [false true true ] ; . X X [true false false]] ; X . . [[true true false] ; X X . [true false false] ; X . . [false true true ]]); . X X Like about 8 out of 9 of the tickets this code generates, ours is not a winner. Below is the code that generated this, plus a small suite of simple tests, and a detailed discussion of it all. To understand the code and its discussion, it is recommended that you visit the link above and familiarize yourself with the appearance of the tic-tac-toe tickets, particularly the geometry of the potential winning lines and corresponding prize amounts. This section is fairly technical so you may want to skip to the concluding remarks at the bottom of this post with control-end and page-up now. You can also control-F for sample test result to get there quickly. That phrase occurs at the start of the conclusion and nowhere between here and there. In Firefox and Thunderbird, triple-clicking the isolated line above will select it for quick copy/paste. The first part of the code is a set of utility functions centered around randomizing various things: making shuffled vectors from seqs, uniformly picking a random element from a coll, picking randomly from a set of alternatives with specified probability weightings, and generating a boolean that has a 1 in N chance of being true. That last is mainly used in ifs: (if (one-in N) (do-rare-thing) (do-common-thing)); (one-in 2) generates a coin flip boolean and (one-in nil) is constantly false, so nil can be used as "infinity" here, e.g. (if (one-in (cond (some-test) 1 ; always (another-test) 2 ; half the time (test-no-3) 100 ; rarely :otherwise)) nil ; never (do-something)) All of these use the rand-int function and none of the rest of the code directly uses the RNG. These functions are thus the one area that needs modifying to drop in another RNG implementation, which could be done with a replacement for rand-int that calls a custom implementation that, say, stores its state in an atom held in a global var. This allows replacement of the default RNG with something like, say, a Mersenne Twister or an entropy pool or even, hardware permitting, a quantum source of genuine randomness. (defn shuffle [ss] (let [s (count ss)] (loop [out (transient (vec ss)) n (dec s)] (if (zero? n) (persistent! out) (let [i (rand-int (inc n)) t (out n)] (recur (assoc! (assoc! out n (out i)) i t) (dec n))))))) (defn rand-elt [s] (nth s (rand-int (count s)))) (defn rand-elt-fmap [fmap] (let [ftot (reduce + (vals fmap)) n (rand-int ftot)] (loop [i 0 k nil fmap (seq fmap)] (if (< n i) k (let [f (first fmap)] (recur (+ i (val f)) (key f) (next fmap))))))) (defn one-in [n] (and n (= (rand-int n) 0))) The next part of the code generates "templates", which are three-element vectors whose final element is the visible eight tic-tac-toe grids. That's a list of eight lists of three lists of three integers. The other two parts are used by the hidden number generator. Importantly, the latter can make a winner or a loser out of any ticket. The only thing different across the different ticket "species" is the number of possibly-winning grids: one species will fairly often have one possibly-winning grid and rarely have 7, another sometimes 2 and rarely 6, another occasionally 3 and rarely 5, another infrequently 4, and another rarely 8. These are successively rarer, as well as having increasing probabilities of losing. Each has three subspecies which are subtler to spot but produce more distinct prize distribution differences -- one produces rare $100 wins, one common $3 wins, and one wins of various amounts except $3 at various frequencies. All species and subspecies have about the same expected total prize amount, though -- roughly $1. (def t1 [[:za :ca :xc] [:cb :xx :ys] [:yc :zb :cc]]) (def t2 [[:xc :ca :za] [:ys :xx :cb] [:cc :zb :yc]]) (def rarities [t1 t2 t1 t1 t1 t1 t2 t1]) (defn create-template [] (let [nums (shuffle (range 1 40)) cs (subvec nums 0 15) xs (subvec nums 15 23) ys (subvec nums 23 31) zs (subvec nums 31) xx1s (subvec xs 0 2) xx2s (subvec xs 2 4) xc1s (subvec xs 4 6) xc2s (subvec xs 6) yc1s (subvec ys 0 2) yc2s (subvec ys 2 4) ys1s (subvec ys 4 6) ys2s (subvec ys 6) za1s (subvec zs 0 2) zb1s (subvec zs 2 4) za2s (subvec zs 4 6) zb2s (subvec zs 6) grid-nums (map #(+ (* 2 %1) %2) (shuffle (rand-elt-fmap {[0 0 0 0 0 0 0 0] 1 [0 0 0 0 0 0 0 1] 100 [0 0 0 0 0 0 1 1] 50 [0 0 0 0 0 1 1 1] 20 [0 0 0 0 1 1 1 1] 5 [0 0 0 1 1 1 1 1] 20 [0 0 1 1 1 1 1 1] 50 [0 1 1 1 1 1 1 1] 100 [1 1 1 1 1 1 1 1] 1})) (repeatedly 8 #(rand-int 2))) card (map (fn [grid num] (let [twos (> num 1) m (mod num 2) xxs (if twos xx2s xx1s) xcs (if twos xc2s xc1s) ycs (if twos yc2s yc1s) yss (if twos ys2s ys1s) zas (if twos za2s za1s) zbs (if twos zb2s zb1s) cs (shuffle cs) ca (cs 0) cb (cs 1) cc (cs 2)] (map (fn [row] (map (fn [cell] (condp = cell :ca ca :cb cb :cc cc :za (zas m) :zb (zbs m) :xx (xxs m) :xc (xcs m) :yc (ycs m) :ys (yss m))) row)) grid))) rarities grid-nums)] [nums grid-nums card])) The template generator takes a random permutation of 1 to 39 and partitions it into 15 common numbers, which will be among the hidden numbers on all tickets generated from that template, and 24 controlled numbers, only 9 of which will appear on each ticket and which are subdivided into three sets of 8, called xs, ys, and zs, which are each further categorized in three ways: "ones" versus "twos" and "parity", about which see below, and two possible positions within grids: for the xs, center or corner, for the ys, side or corner, and for the zs, a or b. The t1 and t2 templates are grid templates with the numbers replaced by categories: za and zb for the zs, xx (center) and xc (corner) for the xs, and yc (corner) and ys (side) for the ys, as well as three cells marked ca, cb, and cc which can get arbitrary common numbers. These templates both have the property that every winning line must cross either an x and a y, an x and a z, or a y and a z. Further, one of the diagonals (which generate the most valuable prizes) requires both xs and a y. This feature is used to control the prize distribution at the coarsest level and to make it possible to make both losing and winning tickets based on any template this function can emit. The rarities template is mostly t1s with two t2s. The t2 template is simply a mirror image of the t1 template and the rarities template arranges things so the four biggest prizes, the two $50,000s and the two $1000s, are x-x-y diagonals. These diagonals will be made very rare. The template generator first shuffles 1 to 39 and partitions the result; these numbers, in this sequence, will be passed to the hidden-number generator as the first element of the template vector. Then it generates the second part: a seq of 8 numbers between 0 and 3. Each grid is assigned one of these numbers. It's actually a bitfield of sorts: the high bit divides the grids into "ones" and "twos" and the low bit gives some of them "even" parity and some "odd". Note that the various types of xs, ys, and zs come in pairs; the parity is used sometimes to prevent certain winning lines from occurring in hidden-number generation. Depending on a grid's parity, only the zeroth or only the first elements of those vectors make it into that grid. Similarly, there are xs that only occur in "ones" and xs that only occur in "twos". So, the 0 1 2 3 number of a grid determines which of four sets of controlled numbers occur on that grid. Further, the distribution of "ones" and "twos" is skewed so most cards will have one of one type and seven of the other, fewer will have two of the rarer type, fewer still will have three, fewer still will have four of each, and only rarely will a card have all of the same type. (These produce the five distinguishable species). As you might have guessed, the hidden-number generator, by choosing from among the xs and ys and zs appropriately, will make only the "ones" or only the "twos" eligible to win, and will almost always choose the rarer of the two. This means most winning cards have a single win, fewer have two, and so on. The user might try to game that by picking a card with a less skewed number distribution (or, if he's lucky enough, one with a lot of non-occurring numbers) hoping for four or even eight winning grids, but the hidden-number generator compensates by making more of those cards pure losers than of cards that have the more skewed distributions. The card itself just uses random common numbers along with the grid-number (0 to 3) determined xs, ys, and zs to fill in each of the t1 or t2 templates. The output vector is the shuffling of the range 1 to 39, the grid-nums, and the exposed part of the card. (defn ifilt [coll idx] (if idx [(nth coll idx)] coll)) This is a utility function used when generating the hidden numbers. It takes a coll and an index and if index is nil, returns the coll, otherwise returns a one-element vector of the coll's element at that index. So, it can be used to selectively narrow a coll down to one element or not. In practice, it gets used on two-element collections (specific pairs of xs, ys, or zs) with either 0, 1 or nil (so, a parity or nil) as idx to force parity 0, force parity 1, or allow both parities. It plays a role in dealing with a complex set of possible winning cards. (defn pick-player-nums [[nums grid-nums _]] (let [cs (subvec nums 0 15) xs (subvec nums 15 23) ys (subvec nums 23 31) zs (subvec nums 31) xx1s (subvec xs 0 2) xx2s (subvec xs 2 4) xc1s (subvec xs 4 6) xc2s (subvec xs 6) yc1s (subvec ys 0 2) yc2s (subvec ys 2 4) ys1s (subvec ys 4 6) ys2s (subvec ys 6) x1s (concat xx1s xc1s) x2s (concat xx2s xc2s) y1s (concat yc1s ys1s) y2s (concat yc2s ys2s) z1s (subvec zs 0 4) za1s (subvec zs 0 2) zb1s (subvec zs 2 4) z2s (subvec zs 4) za2s (subvec zs 4 6) zb2s (subvec zs 6) num-freqs (frequencies (map #(quot % 2) grid-nums)) rare-num (key (first (sort-by val num-freqs))) twos (= rare-num 1) cgood (num-freqs rare-num) lose-mod ([nil 4 10 14 18 nil nil nil 3] cgood) flip-prob ([nil 1000 100 5 1 nil nil nil nil] cgood) twos (if (one-in flip-prob) (not twos) twos) ; Corners that could win big prizes ct #(= (> % 1) twos) cf (constantly false) pr-corners (map #(%1 %2) [ct ct cf cf cf cf ct ct] grid-nums) t2-corners (set (map #(quot % 2) [(second grid-nums) (nth grid-nums 6)])) prcc (count (filter identity pr-corners)) ; $50,000 possible? biggie (or (first pr-corners) (second pr-corners)) ; Chance we're an xxy ticket xxy-one-in (condp = prcc 0 1000 1 (if biggie 10000 2000) 2 nil 3 nil 4 nil) xxy (one-in xxy-one-in) [xyfreq xzfreq yzfreq lose-freq] (if xxy [nil nil nil nil] ; N/A (condp = t2-corners #{(if twos 0 1)} [0 0 1 1] #{(if twos 1 0)} [0 0 1 130] #{0 1} [30 59 1 2])) ; What kind of ticket are we? ticket-type (if xxy :xxy (if (one-in (* lose-freq lose-mod)) :xyz :loser))] (condp = ticket-type :loser (let [[xs ys zs] (rand-elt [[x1s ys2s z2s] [x2s ys1s z1s]]) ws (take 9 (shuffle (concat xs ys zs)))] (sort (concat cs ws))) :xyz (let [[xxs xcs ycs yss zas zbs type] (rand-elt-fmap (if twos {[xx2s xc2s yc2s ys2s za1s zb1s :xy] xyfreq [xx2s xc2s yc1s ys1s za2s zb2s :xz] xzfreq [xx1s xc1s yc2s ys2s za2s zb2s :yz] yzfreq} {[xx1s xc1s yc1s ys1s za2s zb2s :xy] xyfreq [xx1s xc1s yc2s ys2s za1s zb1s :xz] xzfreq [xx2s xc2s yc1s ys1s za1s zb1s :yz] yzfreq})) xs (concat xcs xxs) ys (concat ycs yss) zs (concat zas zbs) xes [(first xcs) (second xxs)] xos [(first xxs) (second xcs)] xeos (rand-elt [xes xos]) zes [(first zas) (second zbs)] zos [(first zbs) (second zas)] zeos (rand-elt [zes zos]) [axs ays azs exs] (condp = type :xy (if (one-in 100) [xeos yss zs ycs] [xxs yss zs ycs]) :xz (let [[zas xcs xxs zbs] (map ifilt [zas xcs xxs zbs] (rand-elt-fmap {[0 nil 1 0] 368 [0 1 1 nil] 50 [nil 0 1 0] 1 [0 0 1 nil] 40 [1 0 nil 0] 1 [1 nil 0 0] 40 [1 1 0 nil] 40 [1 0 0 nil] 50 [1 nil 0 1] 368 [nil 1 0 1] 1 [0 1 nil 1] 1 [0 nil 1 1] 40}))] [xcs ys zas (concat zbs xxs)]) :yz (if (one-in 100) [xs ycs zeos yss] [xs ycs zas yss])) ws (take 9 (shuffle (concat axs ays azs exs)))] (sort (concat cs ws))) :xxy (let [[ycs xxs xcs azs] (if twos [yc2s xx2s xc2s z1s] [yc1s xx1s xc1s z2s]) ws (take 9 (shuffle (concat ycs xxs xcs azs)))] (sort (concat cs ws)))))) This behemoth actually picks the hidden 24 numbers of a ticket and is the beating heart of the whole program. Given the *same* template it can generate numbers that will combine with the template to make a variety of winning tickets and losing ones. Depending on which of the five "species" the template is, it will make somewhat different distributions: every prize will be possible, but the distribution of the *number* of prizes (up to 8) will differ by species, clustering around 1, 2, 3, 4, and 8 with lesser peaks at 7, 6, and 5 in the first three cases. The more potential prizes, the more likely the ticket will be generated a pure loser, to compensate. There are also three "subspecies" of ticket, which do have discernibly different prize distributions: one produces mainly losers and $3 winners, one losers and infrequent $100 winners, and one losers mixed with all of the prize amounts except $3. Their average payouts are the same, however, via tuned proportions of losers to winners for each supspecies. Each species can come in all three of the subspecies. The algorithm is very complex and took quite a bit of fine-tuning to give good results, but it exploits the template structures t1 and t2 above. Recall that a winning line must pass through an x and a y, an x and a z, a y and a z, or a particular two xs and y. Furthermore, each combination produces its own prize value. There are two goals here: 1. avoid generating any grid that has more than one tic-tac-toe; 2. make the higher-value prizes correspondingly rarer. The first choice made is whether to generate hidden numbers to make the ticket a loser, an "xxy" ticket with the possibility of winning the two biggest prizes, or an "xyz" ticket. The loser probability is modified in several places to counterweight various factors, starting with which species the ticket is (loser-mod near the top of the big let). Let's assume, for now, that it picks "ones" to be eligible for winning rather than "twos". Deciding the probability of a potentially big winning xxy ticket is done by counting how many of the four corner grids (capable of yielding $1000 or more) are "ones". If more than one is, then it won't generate an xxy ticket at all; we only give out at most one $1000+ prize per entire ticket. Otherwise, the odds are highest if no corner is a "one" (maximum xxy diagonal prize: $250) and lowest if an upper corner is (maximum xxy diagonal prize: $50,000). It also decides some things that will only matter if it generates an xyz ticket. The trickiness here comes from the fact that the $3, $20, and $100 prizes for the verticals are not mirrored on the right half of the ticket but are in the same left-to-right order. Otherwise we would just use t2 for all right-hand grids and t1 for all left-hand grids at template generation and then forget about it. Instead, t2 differs in a key way from t1: a t2 "yz" winner will be a $100 winner, whereas a t1 "yz" winner can be a $3 or $100 winner. So we make the t2s as rare as possible (only the top right and bottom left corner grids) and special-case them. We count how many are "ones" (again assuming we picked "ones" to possibly win) and it's 0, 1, or both of them. In the case where exactly one can win an xyz ticket will probably be an xy ticket or an xz ticket, with a low likelihood of being a yz ticket. That's half of all tickets. In the other cases we generate only yz tickets; where neither t2 can win we do it frequently and where both t2s can win we make it 300 times more likely the ticket as a whole is a loser. This adds a potentially user-distinguishable set of ticket subspecies, one of which will produce zero or more prizes of $3, one of which will produce $100s, and one of which will produce potentially any prize other than $3. This proved necessary to get a nice distribution of prizes from random tickets, with $3 winners commonest, but the rarity weighting is such that cherry-picking the potential $100 winner subset won't change your expected prize amount, nor will picking either of the other subspecies -- the $3 dominated subspecies wins somewhat more frequently but with a correspondingly lower average payout from the winners, since $3 is the minimum prize value. After that, the actual number picking occurs. After accounting for the 15 common numbers, exactly 9 of the 24 controlled numbers must be doled out. For a "loser" ticket, it takes the x2s, y1s, and z1s, or the x1s, y2s, and z2s, but omits the corner ys and includes only the side ys. Every grid gets either only the xs, or the zs and a side y; the xs case can produce a five-X losing board and the zs-side-y case can produce a six-X losing board with a hexagonal pattern. Thus, choosing the 9 hidden numbers from these two sets of ten will produce no-win tickets. The "xxy" ticket is also easy to generate: assuming we have picked "ones" to possibly win, we pick the corner y1s, center x1s, and corner x1s as six of our set of potential hidden numbers. The z2s give is four more. The "twos" will get only zs and lose while the "ones" can win on the potentially major prize-winning diagonal. That leaves the "xyz" ticket, or rather, three sub-cases: xy, xz, and yz. The xy case is capable of producing $10 and $100 prizes on the six t1 grids and $3 and $10 prizes on the two t2 grids. The code first decides if it's going to possibly win $100. If not (the usual case) it combines the center x1s, all y1s, and all z2s. The "twos" again get only zs and lose; the center xs and corner ys don't produce winning lines without the corner xs, the corner ys and size ys don't produce winning lines, and the center xs and side ys produce ten dollar prizes. There are ten numbers if this set -- two xs, four ys, and four zs -- of which one is randomly omitted. If it decides to allow $100 prizes, it takes two x1s with the property that the center xs and corner xs are of opposite parity (there are two possible such subsets of the x1s, of which it chooses one at random) instead of the center x1s. Because of the parity selection, no single grid will get both the center x and corner x Xed, and this prevents double-winner grids with both the $10 and the $100 lines Xed. Instead, the side y may combine with one OR the other to produce a prize on any given grid, but not both. Also, the corner Y can never combine with both Xs in one grid to potentially win the big xxy prizes of $250 or more. The yz case is capable of producing $3 and $100 winners on the six t1 grids and only $100s on the two t2 grids. The code makes the vertical line through yc and za common and the horizontal line through yc and zb rare. It's handled similar to the xy case: the x2s are four of our candidates, which make the "two" grids losers, and the y1s are another four candidates. The side ys cannot form any winning lines. The winning patterns are determined by the choice of zs. The two za1s make the final two candidates if we're avoiding the bottom row (most of the time); this wins $3 except on t2s. See above for how controlling the distribution of yzs depending on whether neither, one, or both t2s can win is used to prevent too many $100 prizes resulting from this. In the case we're allowing the bottom row, we pick the za of one parity and the zb of the other instead of both zas. The t1 grids with one parity can win $3 and the others can win $100, but none can win both. Again we have ten candidates so we discard one randomly. The xz case is the most complex, and this despite the fact that the same prize amounts are possible for t1s and t2s. The thing is, they can be $5, $20, or $250 winners, and there's no simple way to pick nine of the numbers to prevent a double win. Obviously we inclide the y2s for all those loser grids out there. The trick is picking five of the eight combined x1s and z1s that can never generate a double win, and whose prize distribution is as desired: $5 much commoner and $250 passing rare. That is what the magic box of [0 nil 1 0] type stuff is for. We have corner and center xs and zas and zbs and pick five of them: two of one of those and one each of the others, with parities chosen such that at most one prize is possible for a grid of each parity. The various vectors of 0, 1, and nil exhaust the possible choices that don't allow double winners. The two with relative weighted frequencies of 368 give $5 winners on one parity and losers on the other. The two weighted 50 give $20 on one and lose on the other; the four weighted 40 give $20 on one and $5 on the other; and the four weighted 1 give $250 on one and either $5 or $20 on the other. Since there are five combined x1s and y1s picked, this case generates an exact set of nine hidden numbers, instead of picking ten and discarding one. All three cases have a chance to generate losers; the other two from the choice of random number to omit and this one from the fact that only one parity is sometimes capable of winning, halving the number of winning grids on the average ticket. (defn create-ticket [] (let [template (create-template) player-nums (pick-player-nums template)] [player-nums (template 2)])) This actually creates a ticket, by first creating a template and then generating the 24 hidden numbers and emitting a vector with the full seq of 24 hidden numbers followed by the grids generated as part of the template. This is the end of the ticket generator. The rest of the code below is for testing it. (def win-3 [[true false false] [true false false] [true false false]]) (def win-5 [[true true true] [false false false] [false false false]]) (def win-10 [[false false false] [true true true] [false false false]]) (def win-20 [[false true false] [false true false] [false true false]]) (def win-100a [[false false true] [false false true] [false false true]]) (def win-100b [[false false false] [false false false] [true true true]]) (def win-biga [[false false true] [false true false] [true false false]]) (def win-bigb [[true false false] [false true false] [false false true]]) (def winners [win-3 win-5 win-10 win-20 win-100a win-100b win-biga win-bigb]) The ten winning combinations, as grids of booleans. Next come helper functions for testing a "hit grid" of booleans (true for your Xs) against these patterns. (defn combine ([hit-grid-seq] (reduce combine [[false false false][false false false][false false false]] hit-grid-seq)) ([hit-grid-a hit-grid-b] (map (fn [row-a row-b] (map #(or %1 %2) row-a row-b)) hit-grid-a hit-grid-b))) (def errors (for [a winners b winners :when (not= a b)] (combine a b))) (defn has-all? [test-grid hit-grid] (= (count (filter identity (apply concat test-grid))) (count (filter identity (mapcat (fn [row-a row-b] (map #(and %1 %2) row-a row-b)) hit-grid test-grid))))) That last determines if a test grid (e.g. win-3) matches a hit-grid by seeing if all of the trues in the test grid are true in the hit grid. The combine function works almost the same way, except it uses or instead of and to combine booleans and is arity-overloaded. The errors seq is constructed with combine from every distinct pair of winners. (The combine function's overload actually ended up unused as a result of some minor refactoring.) It is thus the case that: If a hit grid matches any of the errors it represents a multiple-winner, which shouldn't happen. If a hit grid matches none of the errors but one of the winners it represents a win of that prize. Most of the prizes are numerical amounts, but the two diagonals depend on the grid's position on the card. If a hit grid matches none of the winners, then it's a loser. Obviously. (defn hit-grids [[player-nums card]] (let [player-nums (set player-nums)] (map (fn [grid] (vec (map (fn [row] (vec (map #(contains? player-nums %) row))) grid))) card))) This "scratches" a ticket and generates a seq of 8 hit-grids with trues corresponding to the numbers the player got. (defn prizes [ticket] (let [player-nums (set (first ticket)) hits (hit-grids ticket)] (sort (filter #(not= 0 %) (map (fn [hit-grid biga bigb] (if (some #(has-all? % hit-grid) errors) :error (condp has-all? hit-grid win-3 3 win-5 5 win-10 10 win-20 20 win-100a 100 win-100b 100 win-biga biga win-bigb bigb 0))) hits [50000 250 250 250 250 250 250 1000] [250 50000 250 250 250 250 1000 250]))))) This computes a seq of the prizes won on a card, from zero to eight of them. The condp tests each hit-grid generated from the card with has-all? (has-all? takes the test-grid first specifically to work neatly with condp). First it's checked for errors though, which produce a :error keyword in the output if encountered. (The #(not= 0 %) instead of (complement zero?) in the filter is not for brevity but to avoid a ClassCastException if this happens.) After that, the condp checks it against each winner and emits a prize amount, defaulting to 0. This is mapped over the eight hit-grids, along with two vectors that specify the biga and bigb prize amounts (which you'll recall are position-dependent). The following functions use prizes to compute other interesting facts about a ticket. (defn largest-prize [ticket] (if-let [p (last (prizes ticket))] p 0)) (defn num-prizes [ticket] (count (prizes ticket))) (defn prize [ticket] (reduce + (prizes ticket))) These are used to implement these statistical checks, which compute various distributions over the total ticket population. (defn prizes-distribution [n-tickets] (sort-by val (frequencies (map prizes (repeatedly n-tickets #(create-ticket)))))) (defn prize-distribution [n-tickets] (sort-by key (frequencies (map prize (repeatedly n-tickets #(create-ticket)))))) (defn largest-prize-distribution [n-tickets] (sort-by key (frequencies (map largest-prize (repeatedly n-tickets #(create-ticket)))))) (defn num-prizes-distribution [n-tickets] (sort-by key (frequencies (map num-prizes (repeatedly n-tickets #(create-ticket)))))) Next comes a function that computes the house's take, assuming the real-world ticket price of $3 visible in the photos in the Wired article: (defn house-take-avg [n-tickets] (double (- 3 (/ (reduce + (map prize (repeatedly n-tickets #(create-ticket)))) n-tickets)))) The final few functions implement tests that the various user-distinguishable species and subspecies of cards generate comparable expected winnings, so that the user cannot game the lottery by preferring one over another. Most crucial is for none to actually offer an expected prize amount exceeding the ticket cost, which would open the lottery to being outright robbed; the closer the averages are to one another (and also if rarer ticket types are actually less desirable) the less useful it becomes for money laundering. My tests generate fairly variable results, but they seem centered near $1 average winnings per ticket (or a net loss of two dollars for the player) for all species and subspecies. Whether specific combinations of species and subspecies produce stronger correlations is not tested here, but that seems unlikely. (defn ttype [grid-nums] (let [num-freqs (frequencies (map #(quot % 2) grid-nums)) rare-num (key (first (sort-by val num-freqs))) cgood (num-freqs rare-num)] ([nil 1 2 3 4 nil nil nil 5] cgood))) (defn ttype2 [grid-nums] (let [qnums (map #(quot % 2) grid-nums) num-freqs (frequencies qnums) rare-num (key (first (sort-by val num-freqs)))] (+ (if (= (second qnums) rare-num) 1 0) (if (= (nth qnums 6) rare-num) 1 0)))) (defn average-prize-test [n-tickets] (let [templates (repeatedly n-tickets #(create-template)) pnum-seqs (map pick-player-nums templates) grid-num-groups (map second templates) tickets (map #(vector %1 (nth %2 2)) pnum-seqs templates) prizes (map prize tickets) ; remove outliers prizes (map #(if (> % 999) 0 %) prizes) ttypes (map ttype grid-num-groups) ttps (map hash-map ttypes prizes) ttns (map #(hash-map % 1.0) ttypes) ttpmap (apply merge-with + ttps) ttnmap (apply merge-with + ttns)] (merge-with / ttpmap ttnmap))) (defn average-prize-test-2 [n-tickets] (let [templates (map (fn [tt] (first (drop-while #(not= (ttype (second %)) tt) (repeatedly #(create-template))))) [1 2 3 4 5])] (map (fn [t] (let [pnum-seqs (repeatedly n-tickets #(pick-player-nums t)) card (nth t 2) tickets (map #(vector % card) pnum-seqs) prizes (map prize tickets) ; remove outliers prizes (map #(if (> % 999) 0 %) prizes)] (double (/ (reduce + prizes) n-tickets)))) templates))) (defn average-prize-test-3 [n-tickets] (let [templates (map (fn [tt] (first (drop-while #(not= (ttype2 (second %)) tt) (repeatedly #(create-template))))) [0 1 2])] (map (fn [t] (let [pnum-seqs (repeatedly n-tickets #(pick-player-nums t)) card (nth t 2) tickets (map #(vector % card) pnum-seqs) prizes (map prize tickets) ; remove outliers prizes (map #(if (> % 999) 0 %) prizes)] (double (/ (reduce + prizes) n-tickets)))) templates))) Here's a sample test result, showing a typical prize distribution on 1000 tickets: user=> (prize-distribution 1000) ([0 883] [3 83] [5 10] [6 9] [9 2] [10 8] [20 2] [30 1] [100 1] [250 1]) Roughly 1/9 of tickets win. Most of those win $3, a few win $5, and only a few in every thousand win more than $10. The 6s and 9s are from tickets with two $3-winning grids and three $3-winning grids, respectively, and the 30 was a single ticket with three $10-winning grids. The only alternative way to do all this that would be wildly different from something like the above that has occurred to me is a rejection algorithm that generates random tic-tac-toe grids and random sets of 24 hidden numbers -- so, fully random tickets -- and then rejects winners with probabilities depending on the total prize amount. This has the advantage of giving exact control over the distribution of prize amounts and not just probabilistic -- you can make one million tickets with exactly specified numbers of each possible kind of winner -- and two giant disadvantages that stopped me even attempting it: 1. it would be ludicrously slow, and probably slow specifically generating the quotas of losers and *small* winners; and 2. the cards that survived the rejection process would probably have obvious patterns that would let the user make inferences about the likely prize amounts! In particular, losing grids would have most of their numbers drawn from the 15 possibilities that aren't in the hidden set and winning grids would contain rows and columns with rare numbers. Losing tickets would mostly lack grids with concentrations of rare numbers. The losing ticket at the top of this post, by contrast, contains several grids with groups of rare numbers and nonetheless has no prize. Most tickets my code generates will look statistically similar, from losers to $50,000 winners and in between. My code also does NOT necessarily have the disadvantage of imprecise prize control (though the law of large numbers should protect any lotto company that decides to use it as is). One could generate one million templates with (create-template) and then apply modified versions of pick-player-nums to subsets of these templates to generate specific numbers of tickets constrained to lose or to generate various probable prize amounts. Or one could process (repeatedly #(create-template)) through functions that pick various target prize amounts, try to generate hidden numbers to produce that prize amount, and emit a ticket or nil after the rejection step. Then you could make a set of tickets with a specific distribution of prizes using several variations on (take N (map try-to-make-certain-ticket-type (repeatedly #(create-template)))). Or just tweak the distribution of the existing (create-ticket) with a rejection algorithm that fine tunes the results. The only bias in the surviving *non-hidden ticket parts* will be to maybe shift the exact statistical distributions of the discernible species and subspecies. If those are still a concern, you could also use code based on average-prize-test-2 and average-prize-test-3 to generate streams of tickets of each combination of species and subspecies and, separately, apply rejection algorithms to each to tweak their distributions in order to force the expectation value of every combo to exactly equal $1, or any other amount. This should make it impossible for the user to game the lottery and also provide the lottery company with an exactly controllable profit margin. This code is somewhat rough, as it's been slapped together over a few hours. It is not going to set any speed records. It's free of overt bugs, but the statistics haven't been checked especially rigorously. It does seem to be cryptographically much stronger than any of the lottery algorithms that Srivastava broke, while generating a drop-in replacement for one of the lotteries he cracked. Feel free to hack on this, modify it, tweak it, add more tests, or completely rewrite it. I think it is an interesting problem, as well as strong evidence that it IS possible to generate scratch-and-win tickets with no correlations between pre-scratching-visible features and expectation value of total prize amount, since it's *demonstrably* possible to make the correlations quite small and, with suitable tweaking of the various magic numbers in the algorithm above, arbitrarily small. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en