Re: solr relatedness weirdness on json facet function

Dan Rosher Wed, 06 Apr 2022 01:52:06 -0700

Hi Michael,

Here are the field and fieldType with a result snippet.


I've checked the stopword list, and words like "a" or "be"  are in it. I've
also used the UI analysis to check that they indeed should be removed when
indexed and queried.

Many thanks,
Dan

*example results:*
....
  "facets": {
    "count": 58215,
    "description": {
      "buckets": [
        {
          "val": "a",
          "count": 4,
          "relatedness": {
            "relatedness": 0.98239,
            "foreground_popularity": 0.01279,
            "background_popularity": 0.01279
          }
        },
        {
          "val": "be",
          "count": 6,
          "relatedness": {
            "relatedness": 0.98239,
            "foreground_popularity": 0.01279,
            "background_popularity": 0.01279
          }
        },
....

*field*:        <field name="description"   type="textgen-stemmed"
indexed="true"  stored="true"  multiValued="false"/>
*fieldtype*:
       <fieldType name="textgen-stemmed" class="solr.TextField"
positionIncrementGap="100">
            <similarity class="solr.ClassicSimilarityFactory"/>
            <analyzer type="index">
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="\.$" replacement=""/>
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="\.\s+" replacement=" "/>
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[*,;|/]" replacement=" "/>
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(\S+)(\.(?i:net))\b" replacement="$1 $2"/>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>                        <!-- STOPWORDS HERE -->
                <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
splitOnNumerics="0"/>
                <filter class="solr.FlattenGraphFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt" />
                <filter class="solr.KStemFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>                      <!-- STOPWORDS HERE -->
                <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
splitOnNumerics="0"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt" />
                <filter class="solr.KStemFilterFactory"/>
            </analyzer>
        </fieldType>


On Tue, 5 Apr 2022 at 14:58, Michael Gibney <mich...@michaelgibney.net>
wrote:

> Both `qf` and `relatedness` should be orthogonal to your question, iiuc.
> Understanding that your question is mainly about which terms are included
> (i.e. included at all -- nevermind ranking), then the only thing that
> should determine that is the field and fieldType config for the terms facet
> "field" property -- i.e., "description". Can you share that information,
> including index-time analysis chain config?
>
> On Tue, Apr 5, 2022 at 8:52 AM Dan Rosher <rosh...@gmail.com> wrote:
>
> > Hi,
> >
> > If I run a facet on relatedness on a qf field (examples below) which has
> > stopword removal, I get stopwords in the json facet?
> >
> > Anyone know why, and if this can be avoided?
> >
> > Many thanks,
> > Dan
> >
> > =================
> >
> > Details
> > Solr 7.7.2
> >
> > http://localhost:8983/solr/collection/select?
> > q=my query&
> > defType=edismax&
> > qf=description&
> > fore={!type=$defType qf=$qf v=$q}&
> > back=*:*&
> > rows=0&
> > json.facet={
> >   "description":{
> >     "type": "terms",
> >     "field": "description",
> >     "sort": { "relatedness": "desc"},
> >     "mincount": 2,
> >     "limit": 8,
> >     "facet": {
> >         "relatedness": {
> >             "type": "func",
> >             "func": "relatedness($fore,$back)"
> >         }
> >     }
> >   }
> > }
> >
>

Re: solr relatedness weirdness on json facet function

Reply via email to