Em... this will be interesting to investigate. JIRA created. https://issues.apache.org/jira/browse/KYLIN-2617
And sure, TOPN is approximate algorithm and it does not give precise result. Nevertheless, cardinality 1 is very special case, I think even approximate algorithm should give correct result in such case. On Sun, May 14, 2017 at 8:21 AM, Billy Liu <[email protected]> wrote: > Thanks Tingmao for the report. > > Could you show us the complete SQL? In your SQL, there is no order by > statement. If no ORDER BY, the query should not be rewritten into TopN > measure. > > 2017-05-12 23:52 GMT+08:00 Tingmao Lin <[email protected]>: > >> Hi, >> >> We found that SUM() query on a cardinality 1 dimension is not accurate >> (or "not correct") when automatically rewritten as TOPN. >> Is that the expected behavior of kylin or there are any other issue? >> >> We built a cube on a table ( measure1: bigint, dim1_id:varchar, >> dim2_id:varchar, ... ) using kylin 1.6.0 (Kafka streaming source) >> >> The cube has two measures: SUM(measure1) and >> TOPN(10,sum-orderby(measure1),group by dim2_id) . (other measures >> omitted) >> and two dimensions dim1_id, dim2_id (other dims omitted) >> >> About the source table data: >> The cardinality of dim1_id is 1 (same dim1_id for all rows in the >> source table) >> The cardinality of dim2_id is 1 (same dim2_id for all rows in the source >> table) >> The possible value of measure1 is [1,0,-1] >> >> When we query >> "select SUM(measure1) FROM table GROUP BY dim2_id" >> => the result has one row:"sum=7", >> from the kylin logs we found that the query has been automatically >> rewritten >> as TOPN(measure1,sum-orderby(measure1),group by dim2_id) >> >> When we write another query to prevent TOPN rewrite, for example: >> >> "select SUM(measure1),count(*) FROM table GROUP BY dim2_id" => one >> row -- "sum=-2,count=24576" >> >> "select SUM(measure1),count(*) FROM table" >> => one row -- "sum=-2,count=24576" >> >> >> The result is different (7 and -2) when rewritting to TOPN or not. >> >> >> My question is: are the following behavior "works as expected" ,or TOPN >> algorithm does not support negative counter values very well , or any issue >> there? >> >> >> 1. SUM() query automatically rewritten as TOPN and gives approximated >> result when no TOPN present in the query. >> >> 2. When cardinality is 1, TOPN does not give accurate result. >> >> >> >> >> Thanks. >> >> >> >> >
