My preference is to treat the list row number as outside of the sentence of interest. Or if it is necessary to be included in a sentence, have it be a sentence on its own. That won't be as straightforward as splitting on a period in cases like "2. Magnesium oxide 400 mg p.o. daily." In cases where there are more than 1 written sentence like your example in the original email, I'd prefer those were each a sentence rather than making the entire list line a single sentence. My feeling is that each line without terminating punctuation would be a single sentence and would exclude the list number.
As an aside, I have encountered several issues with numbered lists being interpreted differently depending on 1. what number is included at the start for example: "2. Magnesium oxide 400 mg p.o. daily." vs "12. Magnesium oxide 400 mg p.o. daily." (This appears to be a chunking issue where the line starting with "12. Magnesium" is identified as starting with chunks [O, O, B-NP, B-NP, I-NP, B-NP, B-ADVP, O] even though the parts of speech appear to be correct) 2. whether there is a period at the end of a list for example: "4. CHF" vs "4. CHF." (This appears to be an issue with the chunker though which produces [O,O] in the first case and [B-VP, B-NP, O] in the second. Cheers, Britt On Mon, Jul 14, 2014 at 7:50 AM, Miller, Timothy < timothy.mil...@childrens.harvard.edu> wrote: > Just curious about an edge case regarding headers/lists and wondering what > people think the correct behavior and annotation are. > > In cases like this: > > #1 Dilated esophagus. > #2 Adenocarcinoma > > my intuition is that each whole line is one sentence. But then there are > cases where the number may be followed by multiple sentences on one line. > 1. EGD as a complex procedure. If there is an abnormality, obtain biopsies. > > For this example my intuition is not as clear. Should there be a break > after the "1." or should the first sentence be "1. EGD as a complex > procedure."? Again, my intuition leans towards the latter but it seems a > bit odd since the "1." kind of distributes over all the following sentences > (i.e. it's like a paragraph descriptor.) > > Does the period after the 1 matter? The number of sentences after the list > header? The fact that it's all on one line? Anything else? > > Tim >