Yeah, those tokens should have position length 2. Can you reduce to a small set of synonyms and text? If you use only whitespace tokenizer and SGF does the issue reproduce?
Mike McCandless http://blog.mikemccandless.com On Fri, Feb 10, 2017 at 10:07 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> wrote: > Example for position end and positionLength of SGF. > > query: natural forest > > WT text start end positionLength type position > natural 0 7 1 word 1 > forest 8 14 1 word 2 > ... > > SPF text start end positionLength type position > natural 0 7 1 word 1 > natural forest 0 14 2 shingle 2 > forest 8 14 1 word 3 > > SGF text start end positionLength type position > natural 0 7 1 word 1 > naturwald 0 14 1 SYNONYM 2 > forêt naturelle 0 14 1 SYNONYM 2 > natürlicher wald 0 14 1 SYNONYM 2 > natural forest 0 14 1 shingle 2 > forest 8 14 1 word 3 > > SPF text start end positionLength type position > natural 0 7 1 word 1 > naturwald 0 9 1 SYNONYM 2 > "forêt naturelle" 0 17 2 SYNONYM 2 > "natürlicher wald" 0 18 2 SYNONYM 2 > "natural forest" 0 16 2 shingle 2 > forest 8 14 1 word 3 > > > SGF (SynonymsGraphFilter) has for all SYNONYM's the same position end and > positionLength. > I suppose that it is not correct? > > Regards > Bernd > > > Am 09.02.2017 um 18:39 schrieb Michael McCandless: >> On Thu, Feb 9, 2017 at 2:40 AM, Bernd Fehling >> <bernd.fehl...@uni-bielefeld.de> wrote: >>> I tried SynonymGraphFilter with my setup and it works right away. >>> It payed of that I did some modifications on my filters while >>> testing 6.3 with my setup. >> >> Good! >> >>> I only replaced SynonymFilter with SynonymGraphFilter and did not >>> use FlattenGraphFilter, pretty simple. So I can confirm that, up >>> to this point, SynonymGraphFilter is a full replacement for >>> SynonymFilter. At least for search-time synonym handling. >>> >>> But this also means there is still some work with the attributes, right? >>> Position looks good, type and start are no problem anyway, but >>> the end position is still wrong and the positionLength for multi-word >>> synonyms. >> >> Can you give an example or make a small test case? >> PositionLengthAttribute is supposed to be correct coming out of >> SynonymGraphFilter. >> >>> One thing I noticed was that the originating token which "produces" >>> synonyms comes out last from SynonymGraphFilter, after the >>> "produced" synonyms. >>> I will have a look inside with debugger but I guess this is due >>> to output buffering of SynonymGraphFilter? >> >> Yeah they do come out in a different order, which token filters are >> allowed to do in general for all tokens leaving from the same position >> ... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -- > ************************************************************* > Bernd Fehling Bielefeld University Library > Dipl.-Inform. (FH) LibTec - Library Technology > Universitätsstr. 25 and Knowledge Management > 33615 Bielefeld > Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de > > BASE - Bielefeld Academic Search Engine - www.base-search.net > ************************************************************* > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org