[GitHub] [solr] alessandrobenedetti commented on pull request #129: SOLR-15407 untokenized field type with sow=false fix + tests

GitBox Mon, 24 May 2021 12:20:51 -0700


alessandrobenedetti commented on pull request #129:
URL: https://github.com/apache/solr/pull/129#issuecomment-847276421



   In the meantime I was still thinking about this and I still think it is a
   bug:
   If we set a field type to be keyword analysed (so producing the same token
   as a String field), the sow works correctly and we have the same behaviour
   I am introducing with the fix.
   
   
   On Mon, 24 May 2021, 16:59 David Smiley, ***@***.***> wrote:
   
   > ***@***.**** commented on this pull request.
   >
   > It seems we don't agree on this yet.
   > ------------------------------
   >
   > In solr/core/src/test/org/apache/solr/search/TestExtendedDismaxParser.java
   > <https://github.com/apache/solr/pull/129#discussion_r637963606>:
   >
   > > @@ -1771,6 +1787,35 @@ public void testSplitOnWhitespace_Basic() throws 
Exception {
   >      assertThat(parsedquery, anyOf(containsString("((name:stigma | 
title:stigma))"), containsString("((title:stigma | name:stigma))")));
   >    }
   >
   > +    @Test
   > +    public void 
testSplitOnWhitespace_stringField_shouldBuildSingleClause() throws Exception
   >
   > Based on the test name, I'd expect sow=true each time. Maybe just drop
   > this part of the method name.
   > ------------------------------
   >
   > In solr/core/src/test/org/apache/solr/search/TestExtendedDismaxParser.java
   > <https://github.com/apache/solr/pull/129#discussion_r637963928>:
   >
   > > @@ -1771,6 +1787,35 @@ public void testSplitOnWhitespace_Basic() throws 
Exception {
   >      assertThat(parsedquery, anyOf(containsString("((name:stigma | 
title:stigma))"), containsString("((title:stigma | name:stigma))")));
   >    }
   >
   > +    @Test
   > +    public void 
testSplitOnWhitespace_stringField_shouldBuildSingleClause() throws Exception
   > +    {
   > +        assertJQ(req("qf", "trait_ss", "defType", "edismax", "q", "multi 
term", "sow", "false"),
   > +            "/response/numFound==1", "/response/docs/[0]/id=='75'");
   > +
   > +        String parsedquery = getParsedQuery(
   > +            req("qf", "trait_ss", "q", "multi term", "defType", 
"edismax", "sow", "false", "debugQuery", "true"));
   > +        assertThat(parsedquery, anyOf(containsString("((trait_ss:multi 
term))")));
   > +    }
   > +
   > +    @Test
   > +    public void 
testSplitOnWhitespace_numericField_shouldBuildAlwaysMultiClause() throws 
Exception
   >
   > Again, just drop "testSplitOnWhitespace_" from the method name, I think.
   > ------------------------------
   >
   > In solr/core/src/test/org/apache/solr/search/TestExtendedDismaxParser.java
   > <https://github.com/apache/solr/pull/129#discussion_r638071458>:
   >
   > > @@ -1771,6 +1787,35 @@ public void testSplitOnWhitespace_Basic() throws 
Exception {
   >      assertThat(parsedquery, anyOf(containsString("((name:stigma | 
title:stigma))"), containsString("((title:stigma | name:stigma))")));
   >    }
   >
   > +    @Test
   > +    public void 
testSplitOnWhitespace_stringField_shouldBuildSingleClause() throws Exception
   > +    {
   > +        assertJQ(req("qf", "trait_ss", "defType", "edismax", "q", "multi 
term", "sow", "false"),
   >
   > This is a change in behavior, and I think it's not a good change. For a
   > non-tokenized field (StrField in this case), I think we should ignore
   > whatever "sow" is and split on whitespace any way, thus here have two terms
   > to match. It would be straight-forward to document this (no differences
   > between numbers and StrField).
   >
   > I think it could be reasonable to try both ways (both split and don't
   > split) and then put a DisjunctionMaxQuery over the two, though I'd prefer
   > not.
   > ------------------------------
   >
   > In solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java
   > <https://github.com/apache/solr/pull/129#discussion_r638074368>:
   >
   > >            } else {
   >              List<Query> subqs = new ArrayList<>();
   >              for (String queryTerm : queryTerms) {
   >                try {
   >                  subqs.add(ft.getFieldQuery(parser, sf, queryTerm));
   > -              } catch (Exception e) { // assumption: raw = false only 
when called from ExtendedDismaxQueryParser.getQuery()
   > -                // for edismax: ignore parsing failures
   > +              } catch (Exception e) {
   > +                /*
   > +                This happens when a field tries to parse a query term of 
incompatible type
   > +                e.g.
   > +                a numerical field trying to parse a textual query term
   > +                 */
   > +                subqs.add(new MatchNoDocsQuery(queryTerm + " is not 
compatible with " + field));
   >
   > It appears this change (the addition of MatchNoDocsQuery here) has no
   > effect but maybe I'm mistaken?
   > ------------------------------
   >
   > In solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java
   > <https://github.com/apache/solr/pull/129#discussion_r638072636>:
   >
   > >            return new RawQuery(sf, queryTerms);
   >          } else {
   >            if (queryTerms.size() == 1) {
   >              return ft.getFieldQuery(parser, sf, queryTerms.get(0));
   > +          } else if(ft instanceof StrField){
   >
   > In essence, I think the behavior I see here was correct *before* -- no
   > special case for either StrField or numerics. In the context of the logic
   > that reaches this point, the field is already ft.isTokenized==false.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/solr/pull/129#pullrequestreview-666761251>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAD5JK7YA7MGLQJO5BK2HM3TPJZUZANCNFSM444WCCXQ>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr] alessandrobenedetti commented on pull request #129: SOLR-15407 untokenized field type with sow=false fix + tests

Reply via email to