On Sat, Sep 7, 2013 at 8:39 AM, Robert Muir wrote:
> On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote:
>> In Japanese, compounds are just decompositions of the input string. In
>> other languages, compounds can manufacture entire tokens from thin
>> air. In those cases, it's something of a
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote:
> In Japanese, compounds are just decompositions of the input string. In
> other languages, compounds can manufacture entire tokens from thin
> air. In those cases, it's something of a question how to decide on the
> offsets. I think that you
In Japanese, compounds are just decompositions of the input string. In
other languages, compounds can manufacture entire tokens from thin
air. In those cases, it's something of a question how to decide on the
offsets. I think that you're right, eventually, insofar as there's
some offset in the orig
On Fri, Sep 6, 2013 at 9:32 PM, Benson Margulies wrote:
> On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir wrote:
>> its the latter. the way its designed to work i think is illustrated
>> best in kuromoji analyzer where it heuristically decompounds nouns:
>>
>> if it decompounds ABCD into AB + CD, the
On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir wrote:
> its the latter. the way its designed to work i think is illustrated
> best in kuromoji analyzer where it heuristically decompounds nouns:
>
> if it decompounds ABCD into AB + CD, then the tokens are AB and CD.
> these both have posinc=1.
> howev
On Fri, Sep 6, 2013 at 8:03 PM, Benson Margulies wrote:
> I'm confused by the comment about compound components here.
>
> If a single token fissions into multiple tokens, then what belongs in
> the PositionLengthAttribute. I'm wanting to store a fraction in here!
> Or is the idea to store N in the