Re: sentence detector newline behavior

Miller, Timothy Wed, 22 May 2013 04:18:28 -0700

That's awesome! It might be worth trying at least. How does the training
process change? Previously the training data would be one sentence per
line, but with newlines as possible mid-sentence characters that could
be trouble, is there a new representation for training data? Or would we
have to use the training api?
Tim


On 05/22/2013 05:20 AM, Jörn Kottmann wrote:
> On 05/21/2013 08:00 PM, Steven Bethard wrote:
>> So perhaps we could re-train it to disambiguate newline characters as well?
>>
> Yes, the OpenNLP Sentence Detector now supports that in the new 1.5.3 
> version out of the box, you can
> specify the set of EOS chars to use, but the default is still: !?. If 
> you have special needs you can also customize
> the feature generation. It should probably be possible to drop the 
> cTAKES eos fix for that now.
>
> Let me know if you have any question or need some help to customize it 
> for cTAKES.
>
> Jörn
>

Re: sentence detector newline behavior

Reply via email to