[
https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360143#comment-16360143
]
Dominique Béjean edited comment on SOLR-11968 at 2/11/18 9:57 PM:
------------------------------------------------------------------
According to steve's comments, I made this test :
1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query time
analyze chain
{code:java}
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.FrenchMinimalStemFilterFactory"/>
</analyzer>{code}
2/ remove the stop word in the synonyms file
om, olympique marseille
The parsed query string are :
for "om maillot"
{code:java}
"parsedquery_toString":"+(((((+name_text_gp:olympiqu +name_text_gp:marseil)
name_text_gp:om)) (name_text_gp:maillot))~1)",{code}
for "olympique de marseille maillot"
{code:java}
"parsedquery_toString":"+((((name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil))) (name_text_gp:maillot))~1)",{code}
for "maillot om"
{code:java}
parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu
+name_text_gp:marseil) name_text_gp:om)))~1)",{code}
for "maillot olympique de marseille"
{code:java}
"parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om
(+name_text_gp:olympiqu +name_text_gp:marseil))))~1)",{code}
The query result count are also the same for all queries.
was (Author: dbejean):
According to steve's comments, I made this test :
1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query time
analyze chain
{code:java}
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="gosport_synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.FrenchMinimalStemFilterFactory"/>
</analyzer>{code}
2/ remove the stop word in the synonyms file
om, olympique marseille
The parsed query string are :
for "om maillot"
{code:java}
"parsedquery_toString":"+(((((+name_text_gp:olympiqu +name_text_gp:marseil)
name_text_gp:om)) (name_text_gp:maillot))~1)",{code}
for "olympique de marseille maillot"
{code:java}
"parsedquery_toString":"+((((name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil))) (name_text_gp:maillot))~1)",{code}
for "maillot om"
{code:java}
parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu
+name_text_gp:marseil) name_text_gp:om)))~1)",{code}
for "maillot olympique de marseille"
{code:java}
"parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om
(+name_text_gp:olympiqu +name_text_gp:marseil))))~1)",{code}
The query result count are also the same for all queries.
> Multi-words query time synonyms
> -------------------------------
>
> Key: SOLR-11968
> URL: https://issues.apache.org/jira/browse/SOLR-11968
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: query parsers, Schema and Analysis
> Affects Versions: master (8.0), 6.6.2
> Environment: Centos 7.x
> Reporter: Dominique Béjean
> Priority: Major
>
> I am trying multi words query time synonyms with Solr 6.6.2 and
> SynonymGraphFilterFactory filter as explain in this article
>
> [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
>
> My field type is :
> {code:java}
> <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> articles="lang/contractions_fr.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
> <filter class="solr.FrenchMinimalStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> articles="lang/contractions_fr.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
> <filter class="solr.FrenchMinimalStemFilterFactory"/>
> </analyzer>
> </fieldType>{code}
>
> synonyms.txt contains the line :
> {code:java}
> om, olympique de marseille{code}
>
> stopwords.txt contains the word
> {code:java}
> de{code}
>
> The order of words in my query has an impact on the generated query in
> edismax
> {code:java}
> q={!edismax qf='name_text_gp' v=$qq}
> &sow=false
> &qq=...{code}
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
> synonyms expansion. It is working as expected.
> {code:java}
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> +name_text_gp:maillot) name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> +name_text_gp:marseil +name_text_gp:maillot)))",{code}
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
> same generated query
> {code:java}
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code}
> I don't understand these generated queries. The first one looks like the
> synonym expansion is ignored, but the second one shows it is not ignored and
> only the synonym term is used.
>
> When I test the analisys for the field type the synonyms are correctly
> expanded for both expressions
> {code:java}
> om maillot
> maillot om
> olympique de marseille maillot
> maillot olympique de marseille{code}
> resulting outputs always include the following terms (obvioulsly not always
> in the same order)
> {code:java}
> olympiqu om marseil maillot {code}
>
> So, i suspect an issue with edismax query parser.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]