A likely weird thought occurred to me. Do you need to keeping the masking 
values between runs? I.e. must the masked value be the same for the same input 
on multiple runs? If not, then the simpliest way that I can think of is to 
either use a sequential number as the replacement value. Keep a hash table so 
that when you look at the unmapped number, you can either determine it has 
already been seen and has a replacement value. If is does, then replace it. If 
it doesn't, generate the next number in order and update your mapping data with 
the input value and its replacement. This could be as simple as a very large 
sequential array.

If you have DB2, then you've got an easy way. Create a table with two column. 
The first column is defined as a serial number which is autogenerated by DB2. 
The second column is the live number. Put an index on both columns. When you 
get a live number, do a lookup in the table to retrieve the mapped value. If 
the lookup fails, add the live number to the table, getting the serial number 
assigned. If this is not random enough, then actually use a random generated 
number instead of a serial number in the first (live) column. This is a bit 
more complicated since, if the live number is not yet in the table, you'll need 
to generate the random number and try to insert a new row (unique index on both 
of the column). If the new row inserts properly (which guarantees that both the 
random number and live number are unique in the table), use the random number. 
If the new row does not insert, then generate a new random number and try to 
insert again. Repeat until the row inserts and use t!
 he random number which succeeded. Some pseudo code might look like:

read live record.
do forever
   select (live, random) where live=record.live from table.
   if success then leave /* first do forever */
   do forever /* second do forever */
      generate random number
      insert (record.live, generated.random) into table
      if succeeded then leave /* second do forever */
   end /* second do forever */
end /* first do forever */
replace live value with random value
write mapped record.

-- 
John McKown 
Systems Engineer IV
IT

Administrative Services Group

HealthMarkets(r)

9151 Boulevard 26 * N. Richland Hills * TX 76010
(817) 255-3225 phone * 
[email protected] * www.HealthMarkets.com

Confidentiality Notice: This e-mail message may contain confidential or 
proprietary information. If you are not the intended recipient, please contact 
the sender by reply e-mail and destroy all copies of the original message. 
HealthMarkets(r) is the brand name for products underwritten and issued by the 
insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance 
Company(r), Mid-West National Life Insurance Company of TennesseeSM and The 
MEGA Life and Health Insurance Company.SM

> -----Original Message-----
> From: IBM Mainframe Discussion List 
> [mailto:[email protected]] On Behalf Of Roberts, John J
> Sent: Friday, May 25, 2012 10:31 AM
> To: [email protected]
> Subject: Masking Numeric Keys
> 
> I can foresee that my organization will soon need to provide 
> test data to an external vendor.  This test data will need to 
> be generated by masking subsets of real production data, 
> since crafting fictional test data would be an impossible 
> undertaking in the time we have available.
> 
> 
> 
> So all Personally Identifiable Information (PII) fields must 
> be masked.  I have figured out techniques to mask names and 
> addresses.  But I now need to figure out a technique to mask 
> a nine digit numeric key.  This field is used as either a 
> primary or secondary key in many files.  So I can't just 
> substitute a random number, since the relationships need to 
> be maintained.  I have identified some requirements for the 
> masking algorithm:
> 
> 
> 
> (1) It must be deterministic (same input produces same output always).
> 
> (2) Uniqueness must be maintained.  Therefore no two original 
> values can translate to the same masked value.
> 
> (3) The masked result must also be a nine digit numeric value.
> 
> (4) It must not be possible to calculate the original value 
> from the masked value (i.e. a one-way transformation).
> 
> 
> 
> I can think of many ways to address the first three 
> requirements.  But I am stuck on number (4).  The closest I 
> can get to meeting this requirement is to assume that the 
> masking algorithm itself is kept secret.  And I know that 
> security thru obscurity is hardly a good plan.
> 
> 
> 
> Do any of the listers have an idea for such as masking algorithm?
> 
> 
> 
> John
> 
> 
> 
> 
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
> 
> 

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to