That's awesome! It might be worth trying at least. How does the training
process change? Previously the training data would be one sentence per
line, but with newlines as possible mid-sentence characters that could
be trouble, is there a new representation for training data? Or would we
have to use the training api?
Tim

On 05/22/2013 05:20 AM, Jörn Kottmann wrote:
> On 05/21/2013 08:00 PM, Steven Bethard wrote:
>> So perhaps we could re-train it to disambiguate newline characters as well?
>>
> Yes, the OpenNLP Sentence Detector now supports that in the new 1.5.3 
> version out of the box, you can
> specify the set of EOS chars to use, but the default is still: !?. If 
> you have special needs you can also customize
> the feature generation. It should probably be possible to drop the 
> cTAKES eos fix for that now.
>
> Let me know if you have any question or need some help to customize it 
> for cTAKES.
>
> Jörn
>

Reply via email to