A likely weird thought occurred to me. Do you need to keeping the masking
values between runs? I.e. must the masked value be the same for the same input
on multiple runs? If not, then the simpliest way that I can think of is to
either use a sequential number as the replacement value. Keep a hash table so
that when you look at the unmapped number, you can either determine it has
already been seen and has a replacement value. If is does, then replace it. If
it doesn't, generate the next number in order and update your mapping data with
the input value and its replacement. This could be as simple as a very large
sequential array.
If you have DB2, then you've got an easy way. Create a table with two column.
The first column is defined as a serial number which is autogenerated by DB2.
The second column is the live number. Put an index on both columns. When you
get a live number, do a lookup in the table to retrieve the mapped value. If
the lookup fails, add the live number to the table, getting the serial number
assigned. If this is not random enough, then actually use a random generated
number instead of a serial number in the first (live) column. This is a bit
more complicated since, if the live number is not yet in the table, you'll need
to generate the random number and try to insert a new row (unique index on both
of the column). If the new row inserts properly (which guarantees that both the
random number and live number are unique in the table), use the random number.
If the new row does not insert, then generate a new random number and try to
insert again. Repeat until the row inserts and use t!
he random number which succeeded. Some pseudo code might look like:
read live record.
do forever
select (live, random) where live=record.live from table.
if success then leave /* first do forever */
do forever /* second do forever */
generate random number
insert (record.live, generated.random) into table
if succeeded then leave /* second do forever */
end /* second do forever */
end /* first do forever */
replace live value with random value
write mapped record.
--
John McKown
Systems Engineer IV
IT
Administrative Services Group
HealthMarkets(r)
9151 Boulevard 26 * N. Richland Hills * TX 76010
(817) 255-3225 phone *
[email protected] * www.HealthMarkets.com
Confidentiality Notice: This e-mail message may contain confidential or
proprietary information. If you are not the intended recipient, please contact
the sender by reply e-mail and destroy all copies of the original message.
HealthMarkets(r) is the brand name for products underwritten and issued by the
insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance
Company(r), Mid-West National Life Insurance Company of TennesseeSM and The
MEGA Life and Health Insurance Company.SM
> -----Original Message-----
> From: IBM Mainframe Discussion List
> [mailto:[email protected]] On Behalf Of Roberts, John J
> Sent: Friday, May 25, 2012 10:31 AM
> To: [email protected]
> Subject: Masking Numeric Keys
>
> I can foresee that my organization will soon need to provide
> test data to an external vendor. This test data will need to
> be generated by masking subsets of real production data,
> since crafting fictional test data would be an impossible
> undertaking in the time we have available.
>
>
>
> So all Personally Identifiable Information (PII) fields must
> be masked. I have figured out techniques to mask names and
> addresses. But I now need to figure out a technique to mask
> a nine digit numeric key. This field is used as either a
> primary or secondary key in many files. So I can't just
> substitute a random number, since the relationships need to
> be maintained. I have identified some requirements for the
> masking algorithm:
>
>
>
> (1) It must be deterministic (same input produces same output always).
>
> (2) Uniqueness must be maintained. Therefore no two original
> values can translate to the same masked value.
>
> (3) The masked result must also be a nine digit numeric value.
>
> (4) It must not be possible to calculate the original value
> from the masked value (i.e. a one-way transformation).
>
>
>
> I can think of many ways to address the first three
> requirements. But I am stuck on number (4). The closest I
> can get to meeting this requirement is to assume that the
> masking algorithm itself is kept secret. And I know that
> security thru obscurity is hardly a good plan.
>
>
>
> Do any of the listers have an idea for such as masking algorithm?
>
>
>
> John
>
>
>
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
>
>
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN