Tim, thanks for working on this! Question: do we have some formal way of evaluating the sentence detector? Maybe we should come up with some dev set that would include examples from mimic...
Dima On Sep 27, 2014, at 8:57, Miller, Timothy <timothy.mil...@childrens.harvard.edu> wrote: > I have been working on the sentence detector newline issue, training a model > to probabilistically split sentences on newlines rather than forcing sentence > breaks. I have checked in a model to the repo under ctakes-core-res. I also > attached a patch to ctakes-core to the jira issue: > https://issues.apache.org/jira/browse/CTAKES-41 > > for people to test. The status of my testing is that it doesn't seem to break > on notes where ctakes worked well before (those where newlines are always > sentence breaks), and is a slight improvement on notes where newlines may or > may not be sentence breaks. Once the change is checked in we can continue > improving the model by adding more data and features, but the first hurdle > I'd like to get past is making sure it runs well enough on the type of data > that the old model worked well on. Let me know if you have any questions. > > Thanks > Tim