Hello, I have a minor bug to report, and a question that may be a part of a major bug.
If I create a custom dictionary with multiple vocabularies and then run cTAKES using this custom dictionary, cTAKES will sometimes replace the vocabulary name with the name of the custom dictionary. An example is shown in the attached image1.png that was run on the MIMIC dataset. I noticed that if I looked up the CUI C1548802 in the UMLS Metathesaurus Browser that had the incorrect vocabulary name inserted, it had ‘NOCODE’ for the code. This only seemed to occur with CUIs from the MTH vocabulary. Is this something that can be fixed within cTAKES? The question and maybe major bug was we ran the same dataset (50 MIMIC notes) twice: once on the custom dictionary with multiple vocabularies described in the attached image1.png, and then using a custom dictionary that only included the snomed vocabulary. Next, we filtered the output from the multiple vocabulary dictionary to only include CUIs that were reported by snomed. The two outputs from cTAKES should have produced the same CUIs, but as can be seen in the attached Venn Diagrams, some of the CUIs reported by cTAKES running the snomed-only dictionary were not reported by cTAKES running the multiple vocabulary dictionary. Do you know why the two outputs would be different? We’re running user installation of cTAKES 4.0.0.1 via ./bin/runPiperFile.sh -p path/to/piperfile -l path/to/custom_dict.xml -i inputDir --xmiOut outputDir And then extracting the CUIs from the output XMI files. Please let me know if I should report this as an issue on the new GitHub repository instead of via email. Thanks! John Caskey