Removing stopwords means that “lead to succeed” is corrupted in the index to 
“lead <blank_position> succeed”.

Don’t remove stopwords.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 30, 2022, at 11:25 AM, Teresa McMains <ter...@t14-consulting.com> 
> wrote:
> 
> Hello all.
> 
> I've been looking through some old questions and answers and didn't quite see 
> what I was looking for.
> 
> I have a client using a third party piece of software that leverages solr for 
> some global-search type capabilities. The user can enter any search string 
> and we want to return any documents of any type that have a match on the 
> search string, checking any fields in the document. Quoted strings should 
> return an exact match.
> 
> So in their examples, searching for Joe Smith, unquoted, could return 
> customers with the name Joe Smith or Emily Smith or Joe Jones, etc. But if it 
> were quoted, then we only want the exact match "Joe Smith" or, I guess, "Joe 
> Smithland" or something.
> 
> The problem we're having is that the unquoted search returns everything 
> correctly, quoted strings with two search terms seem okay, but a quoted 
> search string with multiple terms like "Lead to Succeed" fails. We're trying 
> to have it return the document "Lead to Succeed Inc, DBA". Perhaps its 
> failing because it's only a partial match?? But even if I search for the 
> quoted string "Lead to Succeed Inc, DBA", I do not get a match.
> 
> There are no stopwords.
> There are no synonyms.
> 
> Now unfortunately I don't have access to the solr admin UI because the 
> customer put it behind a firewall and won't give me access. So that's fun. 
> But I've been playing around with the query URL just trying to get anything 
> to work and I can't.
> 
> So for example:
> https://localhost:8343/MyAppURL/rest/solr/select?q=LEAD%2520TO%2520SUCCEED&rows=100&start=0&wt=json
> returns 107 matches, including one with the name we're looking for.
> but 
> https://localhost:8343/MyAppURL/rest/solr/select?q=%2522LEAD%2520TO%2520SUCCEED%2522&rows=100&start=0&wt=json
> returns 0 matches
> 
> I've tried replacing %2520 with %26%26 or %2526%2526 (&&) or with %2B or 
> %252B (+) but no luck there either -- whether I include the quotes or not.
> 
> I know there's a debug parameter &debug=all or &debugQuery=true but when I 
> include those terms in my URL nothing changes at all in the results. So I'm 
> just not seeing the debug output. It there something else I need to do to 
> enable it?
> If this is a matter of needing a "fuzzier" match, how do I include that in 
> the search query URL -- it wasn't clear to me from the documentation?
> 
> Many many thanks!
> Teresa
> 
> From solrconfig -- if it's helpful:
> 
>  <requestHandler name="/select" class="solr.SearchHandler">
>      <lst name="defaults">
>        <str name="q">*:*</str>
>        <str name="defType">edismax</str>
>        <str name="stopwords">true</str>
>        <str name="lowercaseOperators">true</str>
>        <str name="rows">10</str>
>        <str name="df">_tokens</str>
>                           <!-- phrase boosting...only affects relevancy, not 
> inclusion -->
>                           <str name="pf">_tokens^3</str>
>                           <str name="ps">10</str>
>                           <str name="pf2">_tokens^2</str>
>                           <str name="ps2">1</str>
> 
>                           <!-- field boosting -->
>                           <str name="qf">_primaryLabels^10 doc_id^50 
> telephoneNumbers^5 nationalIdentifiers^5 _tokens</str>
>                           <str name="f._primaryLabels.qf">
>                                        alertid
>                                        account_name
>                                        account_number
>                                        bank_name
>                                        party_name
>                                        party_number
>                                        head_of_household_name
>                                        ext_party_number
>                                        full_name
>                                        associate_full_name
>                                        attachment_name
>                                        title
>                                        filename
>                                </str>
>      </lst>
>  </requestHandler>
> 
> 
> From schema.xml:
> 
> <field name="account_customer_name" type="text_general" indexed="true" 
> stored="true" multiValued="false" required="false"/>
> <field name="account_name" type="text_general" indexed="true" stored="true" 
> multiValued="false" required="false"/>
> <field name="account_number" type="string" indexed="true" stored="true" 
> multiValued="false" required="false"/>
> ...
> These search fields are all text_general or string.
> 
>        <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>            <analyzer type="index">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.WordDelimiterFilterFactory" 
> preserveOriginal="1" splitOnCaseChange="0"   splitOnNumerics="0"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="lang/stopwords_en.txt" enablePositionIncrements="true" />
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.WordDelimiterFilterFactory" 
> preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0"/>
>                <filter class="solr.SynonymFilterFactory" 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>            </analyzer>
>        </fieldType>
> 
> 

Reply via email to