That's awesome! It might be worth trying at least. How does the training process change? Previously the training data would be one sentence per line, but with newlines as possible mid-sentence characters that could be trouble, is there a new representation for training data? Or would we have to use the training api? Tim
On 05/22/2013 05:20 AM, Jörn Kottmann wrote: > On 05/21/2013 08:00 PM, Steven Bethard wrote: >> So perhaps we could re-train it to disambiguate newline characters as well? >> > Yes, the OpenNLP Sentence Detector now supports that in the new 1.5.3 > version out of the box, you can > specify the set of EOS chars to use, but the default is still: !?. If > you have special needs you can also customize > the feature generation. It should probably be possible to drop the > cTAKES eos fix for that now. > > Let me know if you have any question or need some help to customize it > for cTAKES. > > Jörn >