Do you actually have a billion numbers in your data? If not, my thought would be to only generate as many mapped keys as you have unique live keys. That's what the pseudo code meant to say. Don't generate a key until you need one. You could even do this using VSAM as the data store. Or, even simplier since you don't need the mapped key to stay uniquely mapped to the same live key, generate them in an AMODE(31) table. PIC S9(9) could be stored in 5 bytes. A full billion, 5 byte, packed numbers would require 5 billion bytes of storage (5 Gb), or about 2^23 bytes. If you wanted to, you could run a program and save this in a VSAM Linear dataset. You could then use this dataset as your permanent map and access it as a DIV (Data In Virtual) file, using very efficient memory mapping. Or create it as an ESDS and access it in RBA mode. Or perhaps even an VSAM RRDS. Generating the file may take a while, especially to guarantee the uniqueness of the random map. The biggest problem migh! t be finding a random number generator which can actually generate uniformly random values in the range [0..5,000,000,000]. Do it over a weekend. Or in a low priority batch job. The dataset should fit on 3 volumes of 3390-3 space.
-- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets(r) 9151 Boulevard 26 * N. Richland Hills * TX 76010 (817) 255-3225 phone * [email protected] * www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets(r) is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company(r), Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM > -----Original Message----- > From: IBM Mainframe Discussion List > [mailto:[email protected]] On Behalf Of Roberts, John J > Sent: Friday, May 25, 2012 11:11 AM > To: [email protected] > Subject: Re: Masking Numeric Keys > > >Do you need to keeping the masking values between runs? I.e. > must the masked value be the same for the same input on > multiple runs? > > No - we will extract a full set of PROD data, mask it, and > then burn a DVD for our vendor. If they ask for a refresh, > we will just repeat using current PROD data. > > >If not, then the simpliest way that I can think of is to > either use a sequential number as the replacement value. Keep > a hash table so that when you look at the unmapped number, > you can either determine it has already been seen and has a > replacement value. If is does, then replace it. If it > doesn't, generate the next number in order and update your > mapping data with the input value and its replacement. This > could be as simple as a very large sequential array. > > >If you have DB2, then you've got an easy way. Create a table > with two column. The first column is defined as a serial > number which is autogenerated by DB2. The second column is > the live number. Put an index on both columns. When you get a > live number, do a lookup in the table to retrieve the mapped > value. If the lookup fails, add the live number to the table, > getting the serial number assigned. If this is not random > enough, then actually use a random generated number instead > of a serial number in the first (live) column. This is a bit > more complicated since, if the live number is not yet in the > table, you'll need to generate the random number and try to > insert a new row (unique index on both of the column). If the > new row inserts properly (which guarantees that both the > random number and live number are unique in the table), use > the random number. If the new row does not insert, then > generate a new random number and try to insert again. Repeat > until the row inserts and use ! > t! > > John, this is actually kinda like my plan B. A real > identifier of value N would be translated to be the Nth value > in a sequence of pseudo-random numbers. The only problem is > maintaining a billion row table. I have thought of asking > our security officer if I can get away with only masking the > last six digits of the identifier, leaving the first three ASIS. > > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO IBM-MAIN > > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN

