[
https://issues.apache.org/jira/browse/LUCENE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498568#comment-14498568
]
Ian Ribas commented on LUCENE-6400:
-----------------------------------
About the TODO, when not expanding ({{expand = false}}), the mappings are
created without preserving the original ({{includeOrig=false}}), isn't that why
the mapping of the first term to itself is needed?
> SynonymParser should encode 'expand' correctly.
> -----------------------------------------------
>
> Key: LUCENE-6400
> URL: https://issues.apache.org/jira/browse/LUCENE-6400
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-6400.patch, LUCENE-6400.patch, LUCENE-6400.patch,
> LUCENE-6400.patch, PositionLenghtAndType-unittests.patch,
> unittests-expand-and-parse.patch
>
>
> Today SolrSynonymParser encodes something like A, B, C with 'expand=true'
> like this:
> A -> A, B, C (includeOrig=false)
> B -> B, A, C (includeOrig=false)
> C -> C, A, B (includeOrig=false)
> This gives kinda buggy output (synfilter sees it all as replacements, and
> makes all the terms with type synonym, positionLength isnt supported, etc)
> and it wastes space in the FST (includeOrig is just one bit).
> Example with "spiderman, spider man" and analysis on 'spider man'
> Trunk:
> term=spider,startOffset=0,endOffset=6,positionIncrement=1,positionLength=1,*type=SYNONYM*
> term=spiderman,startOffset=0,endOffset=10,positionIncrement=0,*positionLength=1*,type=SYNONYM
> term=man,startOffset=7,endOffset=10,positionIncrement=1,positionLength=1,*type=SYNONYM*
> You can see this is confusing, all the words have type SYNONYM, because
> spider and man got deleted, and totally replaced by new terms (Which happen
> to have the same text).
> Patch:
> term=spider,startOffset=0,endOffset=6,positionIncrement=1,positionLength=1,*type=word*
> term=spiderman,startOffset=0,endOffset=10,positionIncrement=0,*positionLength=2*,type=SYNONYM
> term=man,startOffset=7,endOffset=10,positionIncrement=1,positionLength=1,*type=word*
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]