Excellent, thanks. Much simpler. --Chris
Christopher W. Ryan, MD, MS cryanatbinghamtondotedu https://www.linkedin.com/in/ryancw Early success is a terrible teacher. You’re essentially being rewarded for a lack of preparation, so when you find yourself in a situation where you must prepare, you can’t do it. You don’t know how. --Chris Hadfield, An Astronaut's Guide to Life on Earth William Dunlap wrote: > You can also use match(code, unique(code)), as in > transform(dd.2, codex2 = paste0("Person", match(code, unique(code)))) > It is not guaranteed that x!=y implies digest(x)!=digest(y), but it is > extremely > unlikely to fail. This match idiom guarantees that. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > On Thu, May 12, 2016 at 1:06 PM, Christopher W Ryan > <cr...@binghamton.edu <mailto:cr...@binghamton.edu>> wrote: > > I would like to conduct a survival analysis, examining a subject's > time to *next* appearance in a database, after their first appearance. > It is a database of dated events. > > I need to obfuscate or anonymize or mask the subject identifiers (a > combination of name and birthdate). And obviously any given subject > should have the same anonymous code ever time he/she appears in the > database. I'm not talking "safe from the NSA" here. And I won't be > releasing it. It's just sensitive data and I don't want to be working > every day with cleartext versions of it. > > I've looked at packages digest, anonymizer, and anonymize. What do > you think of this approach: > > # running R 3.1.1 on Windows 7 Enterprise > library(digest) > dd <- data.frame(id=1:6, name = c("Harry", "Ron", "Hermione", "Luna", > "Ginny", "Harry"), dob = c("1990-01-01", "1990-06-15", "1990-04-08", > "1999-11-26", "1990-07-21", "1990-01-01")) > dd.2 <- transform(dd, code=paste0(tolower(name), tolower(dob), sep="")) > library(digest) > anonymize <- function(x, algo="sha256"){ > unq_hashes <- vapply(x, function(object) digest(object, algo=algo), > FUN.VALUE="", USE.NAMES=TRUE) > unname(unq_hashes[x]) > } > dd.2$codex <- anonymize(dd.2$code) > dd.2 > table(duplicated(dd.2$codex)) > > Thanks. > > --Chris Ryan > Broome County Health Department > > ______________________________________________ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.