Removing stopwords means that “lead to succeed” is corrupted in the index to “lead <blank_position> succeed”.
Don’t remove stopwords. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 30, 2022, at 11:25 AM, Teresa McMains <ter...@t14-consulting.com> > wrote: > > Hello all. > > I've been looking through some old questions and answers and didn't quite see > what I was looking for. > > I have a client using a third party piece of software that leverages solr for > some global-search type capabilities. The user can enter any search string > and we want to return any documents of any type that have a match on the > search string, checking any fields in the document. Quoted strings should > return an exact match. > > So in their examples, searching for Joe Smith, unquoted, could return > customers with the name Joe Smith or Emily Smith or Joe Jones, etc. But if it > were quoted, then we only want the exact match "Joe Smith" or, I guess, "Joe > Smithland" or something. > > The problem we're having is that the unquoted search returns everything > correctly, quoted strings with two search terms seem okay, but a quoted > search string with multiple terms like "Lead to Succeed" fails. We're trying > to have it return the document "Lead to Succeed Inc, DBA". Perhaps its > failing because it's only a partial match?? But even if I search for the > quoted string "Lead to Succeed Inc, DBA", I do not get a match. > > There are no stopwords. > There are no synonyms. > > Now unfortunately I don't have access to the solr admin UI because the > customer put it behind a firewall and won't give me access. So that's fun. > But I've been playing around with the query URL just trying to get anything > to work and I can't. > > So for example: > https://localhost:8343/MyAppURL/rest/solr/select?q=LEAD%2520TO%2520SUCCEED&rows=100&start=0&wt=json > returns 107 matches, including one with the name we're looking for. > but > https://localhost:8343/MyAppURL/rest/solr/select?q=%2522LEAD%2520TO%2520SUCCEED%2522&rows=100&start=0&wt=json > returns 0 matches > > I've tried replacing %2520 with %26%26 or %2526%2526 (&&) or with %2B or > %252B (+) but no luck there either -- whether I include the quotes or not. > > I know there's a debug parameter &debug=all or &debugQuery=true but when I > include those terms in my URL nothing changes at all in the results. So I'm > just not seeing the debug output. It there something else I need to do to > enable it? > If this is a matter of needing a "fuzzier" match, how do I include that in > the search query URL -- it wasn't clear to me from the documentation? > > Many many thanks! > Teresa > > From solrconfig -- if it's helpful: > > <requestHandler name="/select" class="solr.SearchHandler"> > <lst name="defaults"> > <str name="q">*:*</str> > <str name="defType">edismax</str> > <str name="stopwords">true</str> > <str name="lowercaseOperators">true</str> > <str name="rows">10</str> > <str name="df">_tokens</str> > <!-- phrase boosting...only affects relevancy, not > inclusion --> > <str name="pf">_tokens^3</str> > <str name="ps">10</str> > <str name="pf2">_tokens^2</str> > <str name="ps2">1</str> > > <!-- field boosting --> > <str name="qf">_primaryLabels^10 doc_id^50 > telephoneNumbers^5 nationalIdentifiers^5 _tokens</str> > <str name="f._primaryLabels.qf"> > alertid > account_name > account_number > bank_name > party_name > party_number > head_of_household_name > ext_party_number > full_name > associate_full_name > attachment_name > title > filename > </str> > </lst> > </requestHandler> > > > From schema.xml: > > <field name="account_customer_name" type="text_general" indexed="true" > stored="true" multiValued="false" required="false"/> > <field name="account_name" type="text_general" indexed="true" stored="true" > multiValued="false" required="false"/> > <field name="account_number" type="string" indexed="true" stored="true" > multiValued="false" required="false"/> > ... > These search fields are all text_general or string. > > <fieldType name="text_general" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_en.txt" enablePositionIncrements="true" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > >