Yeah, those tokens should have position length 2.

Can you reduce to a small set of synonyms and text?  If you use only
whitespace tokenizer and SGF does the issue reproduce?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Feb 10, 2017 at 10:07 AM, Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:
> Example for position end and positionLength of SGF.
>
> query: natural forest
>
> WT      text     start  end  positionLength  type  position
>         natural  0      7    1               word  1
>         forest   8      14   1               word  2
> ...
>
> SPF     text     start  end  positionLength  type     position
>         natural  0      7    1               word     1
>  natural forest  0      14   2               shingle  2
>         forest   8      14   1               word     3
>
> SGF     text     start  end  positionLength  type     position
>         natural  0      7    1               word     1
>       naturwald  0      14   1               SYNONYM  2
> forêt naturelle  0      14   1               SYNONYM  2
> natürlicher wald 0      14   1               SYNONYM  2
>  natural forest  0      14   1               shingle  2
>          forest  8      14   1               word     3
>
> SPF     text     start  end  positionLength  type     position
>         natural  0      7    1               word     1
>       naturwald  0      9    1               SYNONYM  2
> "forêt naturelle"  0    17   2               SYNONYM  2
> "natürlicher wald" 0    18   2               SYNONYM  2
> "natural forest" 0      16   2               shingle  2
>          forest  8      14   1               word     3
>
>
> SGF (SynonymsGraphFilter) has for all SYNONYM's the same position end and 
> positionLength.
> I suppose that it is not correct?
>
> Regards
> Bernd
>
>
> Am 09.02.2017 um 18:39 schrieb Michael McCandless:
>> On Thu, Feb 9, 2017 at 2:40 AM, Bernd Fehling
>> <bernd.fehl...@uni-bielefeld.de> wrote:
>>> I tried SynonymGraphFilter with my setup and it works right away.
>>> It payed of that I did some modifications on my filters while
>>> testing 6.3 with my setup.
>>
>> Good!
>>
>>> I only replaced SynonymFilter with SynonymGraphFilter and did not
>>> use FlattenGraphFilter, pretty simple. So I can confirm that, up
>>> to this point, SynonymGraphFilter is a full replacement for
>>> SynonymFilter. At least for search-time synonym handling.
>>>
>>> But this also means there is still some work with the attributes, right?
>>> Position looks good, type and start are no problem anyway, but
>>> the end position is still wrong and the positionLength for multi-word
>>> synonyms.
>>
>> Can you give an example or make a small test case?
>> PositionLengthAttribute is supposed to be correct coming out of
>> SynonymGraphFilter.
>>
>>> One thing I noticed was that the originating token which "produces"
>>> synonyms comes out last from SynonymGraphFilter, after the
>>> "produced" synonyms.
>>> I will have a look inside with debugger but I guess this is due
>>> to output buffering of SynonymGraphFilter?
>>
>> Yeah they do come out in a different order, which token filters are
>> allowed to do in general for all tokens leaving from the same position
>> ...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> --
> *************************************************************
> Bernd Fehling                    Bielefeld University Library
> Dipl.-Inform. (FH)                LibTec - Library Technology
> Universitätsstr. 25                  and Knowledge Management
> 33615 Bielefeld
> Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de
>
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *************************************************************
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to