Thanks Tingmao for the report.

Could you show us the complete SQL? In your SQL, there is no order by
statement. If no ORDER BY, the query should not be rewritten into TopN
measure.

2017-05-12 23:52 GMT+08:00 Tingmao Lin <[email protected]>:

> Hi,
>
> We found that SUM() query on a cardinality 1 dimension is not accurate
> (or "not correct") when automatically  rewritten as TOPN.
> Is that the expected behavior of kylin or there are any other issue?
>
> We built a cube on a table ( measure1: bigint, dim1_id:varchar,
> dim2_id:varchar, ... ) using kylin 1.6.0 (Kafka streaming source)
>
> The cube has two measures: SUM(measure1) and 
> TOPN(10,sum-orderby(measure1),group
> by dim2_id) . (other measures omitted)
> and two dimensions  dim1_id, dim2_id   (other dims omitted)
>
> About the source table data:
> The cardinality of dim1_id  is 1 (same dim1_id for all rows in the source
> table)
> The cardinality of dim2_id  is 1 (same dim2_id for all rows in the source
> table)
> The possible value of measure1 is [1,0,-1]
>
> When we query
>     "select SUM(measure1) FROM table GROUP BY dim2_id"
>  =>     the result has one row:"sum=7",
>       from the kylin logs we found that the query has been automatically  
> rewritten
> as TOPN(measure1,sum-orderby(measure1),group by dim2_id)
>
> When we write another query to prevent TOPN rewrite, for example:
>
>    "select SUM(measure1),count(*) FROM table GROUP BY dim2_id"     =>   one
> row -- "sum=-2,count=24576"
>
>    "select SUM(measure1),count(*) FROM table"
>            =>   one row -- "sum=-2,count=24576"
>
>
> The result is different (7 and -2) when rewritting to TOPN or not.
>
>
> My question is: are the following behavior "works as expected" ,or TOPN
> algorithm does not support negative counter values very well , or any issue
> there?
>
>
> 1. SUM() query  automatically rewritten as TOPN and gives approximated
> result when no TOPN present in the query.
>
> 2. When cardinality is 1, TOPN does not give accurate result.
>
>
>
>
> Thanks.
>
>
>
>

Reply via email to