SELECT c.DT, COALESCE (C.weekActivityCount, 0), COALESCE
(D.monthActivityCount, 0) FROM ( SELECT DT AS DT,
count(*) AS weekActivityCount FROM ST_DEVICE_INFO_FACT_TABLE
WHERE 6 >= LAST_ONLINE_DT_DIFF_DAY 改为:LAST_ONLINE_DT_DIFF_DAY<=6
AND LAST_ONLINE_DT_DIFF_DAY >= 0 AND DT >= '2018-10-11' AND
'2018-10-11' >= DT AND IS_ACTIVATED = 1 GROUP BY dt
) C LEFT JOIN ( SELECT DT AS DT, count(*) AS
monthActivityCount FROM ST_DEVICE_INFO_FACT_TABLE WHERE
29 >= LAST_ONLINE_DT_DIFF_DAY 改为:LAST_ONLINE_DT_DIFF_DAY<=29 AND
LAST_ONLINE_DT_DIFF_DAY >= 0 AND DT >= '2018-10-11' AND '2018-10-11' >=
DT AND IS_ACTIVATED = 1 GROUP BY dt ) D ON C.DT = D.DT
sql 按照上述修改,查询结果就是正确的,应该是kylin 的bug。
------------------ 原始邮件 ------------------
发件人: "shaofengshi"<[email protected]>;
发送时间: 2018年11月20日(星期二) 下午4:40
收件人: "dev"<[email protected]>;
主题: Re: apache kylin left join 数据查询混乱咨询
Hi Ling,
Could you please provide the Kylin log when executing the first query?
Thanks!
Best regards,
Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: [email protected]
Kyligence Inc: https://kyligence.io/
Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]
dbaztp <[email protected]> 于2018年11月19日周一 下午6:52写道:
> Hi Wang Lin,
> I hadencountered this problem as well, the E-mail I had sent to SQL
> developer is as below. Maybe it is helpful to you too.
> PS: You should adapt your SQL to recommended solution.
>
> ----------------------------------------------------------------------------------------------------
> Window functions are supported in Kylin, such as: lead(), lag(), first()
> Solution 1(deprecated):
> SELECT "this_year"."all_count" * 1.0 /"last_year"."all_count" AS
> "new_rate" FROM (SELECT {fn EXTRACT(MONTH FROM "APP_NEW_MC"."ACTION_DT")}
> AS "action_month", SUM("APP_NEW_MC"."ALLCOUNT") AS "all_count", {fn
> EXTRACT(YEAR FROM "APP_NEW_MC"."ACTION_DT")} AS "action_yr" FROM
> "ADL"."APP_NEW_MC" "APP_NEW_MC" WHERE {fn EXTRACT(MONTH FROM
> "APP_NEW_MC"."ACTION_DT")} = {fn EXTRACT(MONTH FROM TIMESTAMPADD(DAY, -1,
> CURRENT_DATE))} AND {fn EXTRACT(YEAR FROM "APP_NEW_MC"."ACTION_DT")} = {fn
> EXTRACT(YEAR FROM CURRENT_DATE)} - 1 AND "APP_NEW_MC"."PRODUCT" = ‘xxxx'
> GROUP BY {fn EXTRACT(MONTH FROM "APP_NEW_MC"."ACTION_DT")}, {fn
> EXTRACT(YEAR FROM "APP_NEW_MC"."ACTION_DT")}) "last_year" LEFT JOIN (SELECT
> {fn EXTRACT(MONTH FROM "APP_NEW_MC"."ACTION_DT")} AS "action_month",
> SUM("APP_NEW_MC"."ALLCOUNT") AS "all_count", {fn EXTRACT(YEAR FROM
> "APP_NEW_MC"."ACTION_DT")} AS "action_yr" FROM "ADL"."APP_NEW_MC"
> "APP_NEW_MC" WHERE {fn EXTRACT(MONTH FROM "APP_NEW_MC"."ACTION_DT")} = {fn
> EXTRACT(MONTH FROM TIMESTAMPADD(DAY, -1, CURRENT_DATE))} AND {fn
> EXTRACT(YEAR FROM "APP_NEW_MC"."ACTION_DT")} = {fn EXTRACT(YEAR FROM
> CURRENT_DATE)} AND "APP_NEW_MC"."PRODUCT" = ‘xxxx' GROUP BY {fn
> EXTRACT(MONTH FROM "APP_NEW_MC"."ACTION_DT")}, {fn EXTRACT(YEAR FROM
> "APP_NEW_MC"."ACTION_DT")}) "this_year" ON "last_year"."action_month" =
> "this_year"."action_month";
> Solution 2(recommended):
> SELECT {fn EXTRACT(YEAR FROM "APP_NEW_MC"."ACTION_DT")} as "YEAR", {fn
> EXTRACT(MONTH FROM "APP_NEW_MC"."ACTION_DT")} as "MONTH",
> SUM("APP_NEW_MC"."ALLCOUNT") / cast(lead(SUM("APP_NEW_MC"."ALLCOUNT"), 12)
> over () as double) as "new_rate" FROM "ADL"."APP_NEW_MC" "APP_NEW_MC" WHERE
> "APP_NEW_MC"."PRODUCT" = ‘xxxx' GROUP BY {fn EXTRACT(YEAR FROM
> "APP_NEW_MC"."ACTION_DT")},{fn EXTRACT(MONTH FROM
> "APP_NEW_MC"."ACTION_DT")} order by "YEAR" desc, "MONTH" desc;
>
>
> Original Message
> Sender:王林[email protected]
> Recipient:[email protected]
> Date:Monday, Nov 19, 2018 18:34
> Subject:apache kylin left join 数据查询混乱咨询
>
>
> 你好: 通过kylin 页面查询发现如下问题: 执行如下sql: SELECT c.DT, COALESCE
> (C.weekActivityCount, 0), COALESCE (D.monthActivityCount, 0) FROM ( SELECT
> DT AS DT, count(*) AS weekActivityCount FROM ST_DEVICE_INFO_FACT_TABLE
> WHERE 6 = LAST_ONLINE_DT_DIFF_DAY AND LAST_ONLINE_DT_DIFF_DAY = 0 AND DT =
> '2018-10-11' AND '2018-10-11' = DT AND IS_ACTIVATED = 1 GROUP BY dt ) C
> LEFT JOIN ( SELECT DT AS DT, count(*) AS monthActivityCount FROM
> ST_DEVICE_INFO_FACT_TABLE WHERE 29 = LAST_ONLINE_DT_DIFF_DAY AND
> LAST_ONLINE_DT_DIFF_DAY = 0 AND DT = '2018-10-11' AND '2018-10-11' = DT AND
> IS_ACTIVATED = 1 GROUP BY dt ) D ON C.DT = D.DT 查询使用cube:
> CUBE[name=st_device_info_cube_v140],CUBE[name=st_device_info_cube_v140]
> 查询结果为: 2018-10-11,996542,996542 但是单独执行C,D两个子查询: SELECT DT AS DT, count(*)
> AS weekActivityCount FROM ST_DEVICE_INFO_FACT_TABLE WHERE 6 =
> LAST_ONLINE_DT_DIFF_DAY AND LAST_ONLINE_DT_DIFF_DAY = 0 AND DT =
> '2018-10-11' AND '2018-10-11' = DT AND IS_ACTIVATED = 1 GROUP BY dt Cubes:
> CUBE[name=st_device_info_cube_v140] 查询结果为: 2018-10-11,996542 执行: SELECT DT
> AS DT, count(*) AS monthActivityCount FROM ST_DEVICE_INFO_FACT_TABLE WHERE
> 29 = LAST_ONLINE_DT_DIFF_DAY AND LAST_ONLINE_DT_DIFF_DAY = 0 AND DT =
> '2018-10-11' AND '2018-10-11' = DT AND IS_ACTIVATED = 1 GROUP BY dt Cubes:
> CUBE[name=st_device_info_cube_v140] 查询结果为: 2018-10-11,1119847 查询结果与第一个left
> join 查询结果不对应。请问是什么原因? 这个问题需要怎么修复呢?