Hello, I've encountered 2 issues while trying to apply unique()/hll() function to a string field inside a range facet:
1. Results are incorrect for a single-valued string field.
2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string
field.
How to reproduce:
1. Create a core based on the default configSet.
2. Add several simple documents to the core, like these:
[
{
"id": "14790",
"int_i": 2010,
"date_dt": "2010-01-01T00:00:00Z",
"string_s": "a",
"string_ss": ["a", "b"]
},
{
"id": "12254",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["b", "c"]
},
{
"id": "12937",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "c",
"string_ss": ["c", "d"]
},
{
"id": "10575",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "b",
"string_ss": ["d", "e"]
},
{
"id": "13644",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["e", "a"]
},
{
"id": "8405",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "d",
"string_ss": ["a", "b"]
},
{
"id": "6128",
"int_i": 2008,
"date_dt": "2008-01-01T00:00:00Z",
"string_s": "a",
"string_ss": ["b", "c"]
},
{
"id": "5220",
"int_i": 2015,
"date_dt": "2015-01-01T00:00:00Z",
"string_s": "d",
"string_ss": ["c", "d"]
},
{
"id": "6850",
"int_i": 2012,
"date_dt": "2012-01-01T00:00:00Z",
"string_s": "b",
"string_ss": ["d", "e"]
},
{
"id": "5748",
"int_i": 2014,
"date_dt": "2014-01-01T00:00:00Z",
"string_s": "e",
"string_ss": ["e", "a"]
}
]
3. Try queries like the following for a single-valued string field:
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
Distinct counts returned are incorrect in general. For example, for the set
of documents above, the response will contain:
{
"val": 2010,
"count": 1,
"distinct_count": 0
}
and
"between": {
"count": 10,
"distinct_count": 1
}
(there should be 5 distinct values).
Note, the result depends on the order in which the documents are added.
4. Try queries like the following for a multi-valued string field:
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
I’m getting ArrayIndexOutOfBoundsException for such queries.
Note, everything looks Ok for other field types (I tried single- and
multi-valued ints, doubles and dates) or when the enclosing facet is a
terms facet or there is no enclosing facet at all.
I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
5.x, as it seems, do not have such issues.
Is it a bug? Or, may be, I’ve missed something?
Thanks,
Volodymyr
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
docs_1-10.json
Description: application/json
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
