I was thinking the same thing as Steve. Thats a pretty regular onc physical 
exam, why not just split sentences with regex's off a small list of defined onc 
physical exam terms? The interesting case would be breast, as this term may 
appear in the body of a sentence (rather than just a term), but u could use a 
regex sub match where u conditionally match breast first then one or more key 
physical findings to correctly identify THAT breast word token as the term, eg 
beginning of the sentence. I would recommend red flag physical findings as they 
are more likely to always been in the body of the sentence, for example, 
Breast: no lumps or masses palpable.


I have a few other ideas if thats barking up the right tree.




JG
—
Sent from Mailbox for iPhone

On Sat, Aug 2, 2014 at 8:58 AM, Steven Bethard <steven.beth...@gmail.com>
wrote:

> On Sat, Aug 2, 2014 at 7:43 AM, Miller, Timothy
> <timothy.mil...@childrens.harvard.edu> wrote:
>> PE: Lymphnodes: neck and axilla without adenopathy Lungs: normal and clear 
>> to auscultation CV: regular rate and rhythm without murmur or gallop , S1, 
>> S2 normal, no murmur, click, rub or gal*, chest is clear without rales or 
>> wheezing, no pedal edema, no JVD, no hepatosplenomegaly Breast: negative 
>> findings right/left breast with mild swelling, warmth, mild erythema, 
>> slightly tender, no seroma or hematoma Abdomen: Abdomen soft, non-tender.
>>
>> It would be preferable to me to put sentence breaks in between the sections, 
>> so the first two sentences would be:
>>
>> 1) PE: Lymphonodes...
>> 2) Lungs: normal...
> [snip]
>> Another example that breaks our model in a different way (truncated):
>> 1. Baseline labwork including tumor markers  2. Start DD AC on Friday 8/1 
>> with RN chemo teach  3. S U parent study
> [snip]
>> Here it would be preferable to get:
>> 1.
>> Baseline labwork...
>> 2.
>> Start DD...
>> 3.
>> S U parent study
> Seems like rather than specifying a set of "candidate characters", we
> want to specify a candidate boundary regular expression. Something
> like, \p{P}|\b\p{Lu}|\b\p{N}, should cover all of the above cases:
> sentence boundaries may appear at punctuation marks, at uppercase
> letters after word boundaries, and at numbers after a word boundaries.
> Steve

Reply via email to