Dear colleagues, This may be a question with a really obvious answer, but I can't find it. I have access to a large file with real medical record identifiers (mixed strings of characters and numbers) in it. These represent medical events for many thousands of people. It's important to be able to link events for the same people.
It's much more important that the real record numbers are strongly obscured. I'm interested in some kind of strong one-way hash function to which I can feed the real numbers and get back unique codes for each record identifier fed in. I can do this on the health service system, and I have to do this before making further use of the data! There is the 'digest' function, in the digest package, but this seems to work on the whole vector of IDs, producing, in my case, a vector with 60,000 identical entries. H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5') I could do this in Perl, but I'd have to do quite a bit of work to get it installed. Any quick suggestions? Anthony Staines -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.