[R] Advice on obscuring unique IDs in R

Anthony Staines Wed, 05 Jan 2011 13:21:00 -0800

Dear colleagues,

This may be a question with a really obvious answer, but I
can't find it. I have access to a large file with real
medical record identifiers (mixed strings of characters and
numbers) in it. These represent medical events for many
thousands of people. It's important to be able to link
events for the same people.


It's much more important that the real record numbers are
strongly obscured. I'm interested in some kind of strong
one-way hash function to which I can feed the real numbers
and get back unique codes for each record  identifier fed
in. I can do this on the health service system, and I have
to do this before making further use of the data!

There is the 'digest' function, in the digest package, but
this seems to work on the whole vector of IDs, producing, in
my case, a vector with 60,000 identical entries.

H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5')

I could do this in Perl, but I'd have to do quite a bit of
work to get it installed.

Any quick suggestions?
Anthony Staines
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Advice on obscuring unique IDs in R

Reply via email to