Dear cTAKES developer, This is Masoud Rouhizadeh from JHU. I'm leading the NLP effort at the Institute for Clinical and Translational Research and work on enterprise-level NLP projects at Johns Hopkins Medicine. One of the major goals we are targeting is de-identification of a large number of notes (350M) to prepare them for search and indexing (Elasticsearch and Solr). I have been in touch with Dr. Guergana Savova about cTAKES Scrubber and she has been very helpful.
One of our most desired features in the de-identification pipeline is synthetic replacement (e.g. Nancy->Sally; random female first name consistently replaces a female first name.). I wasn't able to find information about this feature in cTAKES Scrubber. Is synthetic replacement functionality part of the cTAKES Scrubber, or can it be added by post-processing the output? For instance, if we know the name Nancy is removed from multiple places, can we use a name dictionary to insert random female first names in those places (just a thought)? Overall, I wanted to emphasize that cTAKES Scrubber is one of our main candidates and I'm hoping that we could find ways to collaborate. Thank you very much, Masoud ---- Masoud Rouhizadeh, PhD Faculty - Division of Health Science Informatics (DHSI) NLP Lead - Institute for Clinical and Translational Research (ICTR) Johns Hopkins University School of Medicine https://www.cs.jhu.edu/~mrou/