Kean Kaufmann created CTAKES-450: ------------------------------------ Summary: CDASegmentAnnotator misses all headings after empty segment Key: CTAKES-450 URL: https://issues.apache.org/jira/browse/CTAKES-450 Project: cTAKES Issue Type: Bug Components: ctakes-core Reporter: Kean Kaufmann Attachments: CDASegmentAnnotator.diff
If the CDASegmentAnnotator encounters an empty segment, it throws away everything after that in the document. You can see this in the test document provided for TestCDASegmentAnnotator. The heading "CURRENT HEALTH STATUS" is followed immediately by the heading "Medications"; the test case misses the "Medications" heading, and "FAMILY HISTORY" after that. The sorted_segments loop is only incrementing the index variable for non-empty segments. Patch attached. TestCDASegmentAnnotator output before fix (with getPreferredText()): Segment:2.16.840.1.113883.10.20.22.1.1 Begin:92 End:159: Header Segment:1.3.6.1.4.1.19376.1.5.3.1.1.13.2.1 Begin:176 End:1612: CHIEF COMPLAINT Segment:2.16.840.1.113883.10.20.22.2.20 Begin:1634 End:1696: HISTORY OF PAST ILLNESS Segment:2.16.840.1.113883.10.20.22.2.2.1 Begin:1711 End:2271: History of immunizations After fix: Segment:2.16.840.1.113883.10.20.22.1.1 Begin:92 End:159: Header Segment:1.3.6.1.4.1.19376.1.5.3.1.1.13.2.1 Begin:176 End:1612: CHIEF COMPLAINT Segment:2.16.840.1.113883.10.20.22.2.20 Begin:1634 End:1696: HISTORY OF PAST ILLNESS Segment:2.16.840.1.113883.10.20.22.2.2.1 Begin:1711 End:2271: History of immunizations Segment:2.16.840.1.113883.10.20.22.2.1.1 Begin:2307 End:3506: HISTORY OF MEDICATION USE Segment:2.16.840.1.113883.10.20.22.2.15 Begin:3522 End:5608: Family History -- This message was sent by Atlassian JIRA (v6.4.14#64029)