Re: PositionLengthAttribute

2013-09-07 Thread Benson Margulies
On Sat, Sep 7, 2013 at 8:39 AM, Robert Muir wrote: > On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote: >> In Japanese, compounds are just decompositions of the input string. In >> other languages, compounds can manufacture entire tokens from thin >> air. In those cases, it's something of a

Re: PositionLengthAttribute

2013-09-07 Thread Robert Muir
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote: > In Japanese, compounds are just decompositions of the input string. In > other languages, compounds can manufacture entire tokens from thin > air. In those cases, it's something of a question how to decide on the > offsets. I think that you

Re: PositionLengthAttribute

2013-09-07 Thread Benson Margulies
In Japanese, compounds are just decompositions of the input string. In other languages, compounds can manufacture entire tokens from thin air. In those cases, it's something of a question how to decide on the offsets. I think that you're right, eventually, insofar as there's some offset in the orig

Re: PositionLengthAttribute

2013-09-06 Thread Robert Muir
On Fri, Sep 6, 2013 at 9:32 PM, Benson Margulies wrote: > On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir wrote: >> its the latter. the way its designed to work i think is illustrated >> best in kuromoji analyzer where it heuristically decompounds nouns: >> >> if it decompounds ABCD into AB + CD, the

Re: PositionLengthAttribute

2013-09-06 Thread Benson Margulies
On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir wrote: > its the latter. the way its designed to work i think is illustrated > best in kuromoji analyzer where it heuristically decompounds nouns: > > if it decompounds ABCD into AB + CD, then the tokens are AB and CD. > these both have posinc=1. > howev

Re: PositionLengthAttribute

2013-09-06 Thread Robert Muir
On Fri, Sep 6, 2013 at 8:03 PM, Benson Margulies wrote: > I'm confused by the comment about compound components here. > > If a single token fissions into multiple tokens, then what belongs in > the PositionLengthAttribute. I'm wanting to store a fraction in here! > Or

PositionLengthAttribute

2013-09-06 Thread Benson Margulies
I'm confused by the comment about compound components here. If a single token fissions into multiple tokens, then what belongs in the PositionLengthAttribute. I'm wanting to store a fraction in here! Or is the idea to store N in the 'mother' token and then '