[ 
https://issues.apache.org/jira/browse/SOLR-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325093#comment-17325093
 ] 

Michael Gibney commented on SOLR-14996:
---------------------------------------

[~Hronom] if I understand correctly what you're trying to do, I actually don't 
think the tag/ex is the right way to do it. Please forgive me reading between 
the lines (and correct me if I'm wrong), but: it looks like you have multiple 
docs per {{user_id}}, and each {{user_id}} has 1 or more associated 
{{job_type}} values recorded across those (potentially multiple) docs.

Depending on what your schema (actual data -- I'm not talking about 
{{schema.xml}}) looks like, you might be able to achieve what you want by using 
a {{!join}} query. (Specifically, I think the approach I'm suggesting would 
work if you can guarantee that the multiple docs for the same {{user_id}} are 
will not contain the same {{job_type}} mapping). Basically something like:

{code}{!join from=user_id to=user_id v='job_type:thinker'}.
{code}

Faceting on {{job_type}} for the above domain (assuming validity of the 
schema-related assumptions) should get you the facet counts you want. Note, 
your {{numFound}} in this case will be high, because the domain would by design 
contain multiple docs per {{user_id}}. If you want {{numFound}} for the domain 
as duduped by {{user_id}}, your best option would probably be to use the JSON 
Facet {{unique}} aggregate function?

wrt the way you were trying to use collapse/tag/ex, it looks like collapse gets 
re-applied over the domain with the "selected" tag excluded; in which case the 
"selected" tag is doing nothing. For the facet domain, collapse _does_ get 
re-applied (over an unrestricted domain), but since the collapse post-filter 
doesn't define an ordering for preferring which doc to use as "the" doc for a 
{{user_id}} cluster, the output is essentially arbitrary wrt anything you're 
likely to regard as relevant. (I note that the facet counts add to exactly 1000 
-- presumably the cardinality of {{\*:*}})?

> Facet incorrect counts when FQ exclusion applied with collapsing
> ----------------------------------------------------------------
>
>                 Key: SOLR-14996
>                 URL: https://issues.apache.org/jira/browse/SOLR-14996
>             Project: Solr
>          Issue Type: Bug
>          Components: faceting
>    Affects Versions: 8.6.3
>            Reporter: Yevhen Tienkaiev
>            Priority: Critical
>
> *numFound* not correct according to what is displayed in facets with 
> exclusion when used collapsing and FQ with tag.
> Here example query:
> {code}
> curl --location --request GET 
> 'http://localhost:8981/solr/test/select?facet.field={!ex=selected}job_type&facet=on&fq={!collapse%20field=user_id}&fq={!tag=selected}job_type:thinker&q=*:*&rows=0'
> {code}
> result is:
> {code}
> {
>     "responseHeader": {
>         "zkConnected": true,
>         "status": 0,
>         "QTime": 15,
>         "params": {
>             "q": "*:*",
>             "facet.field": "{!ex=selected}job_type",
>             "fq": [
>                 "{!collapse field=user_id}",
>                 "{!tag=selected}job_type:thinker"
>             ],
>             "rows": "0",
>             "facet": "on"
>         }
>     },
>     "response": {
>         "numFound": 850,
>         "start": 0,
>         "maxScore": 1.0,
>         "numFoundExact": true,
>         "docs": []
>     },
>     "facet_counts": {
>         "facet_queries": {},
>         "facet_fields": {
>             "job_type": [
>                 "runner",
>                 220,
>                 "developer",
>                 202,
>                 "digger",
>                 202,
>                 "thinker",
>                 195,
>                 "ninja",
>                 181
>             ]
>         },
>         "facet_ranges": {},
>         "facet_intervals": {},
>         "facet_heatmaps": {}
>     }
> }
> {code}
> as you can see there FQ with 
> {code}
> {!tag=selected}job_type:thinker
> {code}
> and facets with
> {code}
> {!ex=selected}job_type
> {code}
> in results I see for *thinker* 195, but *numFound* is 850.
> Expected:
> *thinker* 195, *numFound* is 195
> *or*
> *thinker* 850, *numFound* is 850
> You can use this simple project to reproduce the issue 
> https://github.com/Hronom/solr-cloud-basic-auth/tree/main/solr-cloud-playground-collapsing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to