arvin-777 opened a new issue, #33645: URL: https://github.com/apache/doris/issues/33645
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 2.1 ### What's Wrong? show variables like "%ner%"; set global parallel_pipeline_task_num = 2;//这个设置成0,2,4都试过,差别不是特别大 set global enable_profile = true; //背景:之前装了2.1的版本后发现逻辑视图查询比sql慢很多,反馈是逻辑视图规划有问题,然后改了一个2.0.7的版本给我再试,发现在简单的逻辑视图查询方面快了,但是很多场景还是不太行,下面我整理了目前的问题,希望社区朋友们能一起努力解决下,谢谢 //环境是四核CPU,3台BE+1台FE,下面查询过程中看dorisManager中内存和cpu曲线图都没有异常飙高 //最下面附有建表结构和逻辑视图脚本 ------------------------------------第1个问题,下面两个sql一样的意思,第一个花费1.5s,第二个花费9s select count(*) from ( SELECT time, global_id, user_id, account, asset, business from office.v_report_stock_balance where time >= 1709222400 and time <= 1711987199 )a; select count(1) from office.v_report_stock_balance where (time >= 1709222400 and time <= 1711987199) ------------------------------------ 第2个问题,分页查询我们一般查一页数据会按下面三条sql查三次,用逻辑视图和用sql差异有点大 -- 下面三条sql是查询视图的三条语句,1.通过count拿到总条数花费了1.5s,2.分页查询20条花了5.5s,3.sum金额字段得到总计9.5s,目前我表总共有两千万数据 select count(*) from ( SELECT time, global_id, user_id, account, asset, business, change, balance, detail, amount_src, ctime, business_id, customer_number, country, nationality, account_level_code, customer_group_id from office.v_report_stock_balance where time >= 1709222400 and time <= 1711987199 )a; SELECT time, global_id, user_id, account, asset, business, change, balance, detail, amount_src, ctime, business_id, customer_number, country, nationality, account_level_code, customer_group_id from office.v_report_stock_balance where time >= 1709222400 and time <= 1711987199 order by time desc,global_id desc limit 20; select sum(change) from ( SELECT time, global_id, user_id, account, asset, business, change, balance, detail, amount_src, ctime, business_id, customer_number, country, nationality, account_level_code, customer_group_id from office.v_report_stock_balance where time >= 1709222400 and time <= 1711987199 )a ; -- 下面三条sql是用sql代替上面逻辑视图的三条语句,1.通过count拿到总条数花费了9.5s,2.分页查询20条花了2.5s,3.sum金额字段得到总计9.5s,目前我表总共有两千万数据 select count(*) from ( SELECT a.time, a.global_id, a.user_id, a.account, a.asset, a.business, greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change, greatest(cast( concat(split_part(a.balance, '.', 1),'.',strleft(split_part(a.balance, '.', 2),c.prec_show)) as decimal(38,20)),0) AS balance, a.detail, greatest(cast( concat(split_part(a.balance_src, '.', 1),'.',strleft(split_part(a.balance_src, '.', 2),c.prec_show)) as decimal(38,20)),0) AS amount_src, a.ctime, a.business_id, b.customer_number, '' as country, b.nationality, '' as account_level_code, '' as customer_group_id, least(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS out_change FROM stock_gts2.stock_balance_history a LEFT JOIN office.t_customer b ON a.user_id = b.gts2_customer_id LEFT JOIN stock_gts2.t_stock_currency c ON a.asset = c.currency_code where time >= 1709222400 and time <= 1711987199 )a; SELECT a.time, a.global_id, a.user_id, a.account, a.asset, a.business, greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change, greatest(cast( concat(split_part(a.balance, '.', 1),'.',strleft(split_part(a.balance, '.', 2),c.prec_show)) as decimal(38,20)),0) AS balance, a.detail, greatest(cast( concat(split_part(a.balance_src, '.', 1),'.',strleft(split_part(a.balance_src, '.', 2),c.prec_show)) as decimal(38,20)),0) AS amount_src, a.ctime, a.business_id, b.customer_number, '' as country, b.nationality, '' as account_level_code, '' as customer_group_id, least(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS out_change FROM stock_gts2.stock_balance_history a LEFT JOIN office.t_customer b ON a.user_id = b.gts2_customer_id LEFT JOIN stock_gts2.t_stock_currency c ON a.asset = c.currency_code where time >= 1709222400 and time <= 1711987199 order by time desc,global_id desc limit 20; select sum(change) from ( SELECT a.time, a.global_id, a.user_id, a.account, a.asset, a.business, greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change, greatest(cast( concat(split_part(a.balance, '.', 1),'.',strleft(split_part(a.balance, '.', 2),c.prec_show)) as decimal(38,20)),0) AS balance, a.detail, greatest(cast( concat(split_part(a.balance_src, '.', 1),'.',strleft(split_part(a.balance_src, '.', 2),c.prec_show)) as decimal(38,20)),0) AS amount_src, a.ctime, a.business_id, b.customer_number, '' as country, b.nationality, '' as account_level_code, '' as customer_group_id, least(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS out_change FROM stock_gts2.stock_balance_history a LEFT JOIN office.t_customer b ON a.user_id = b.gts2_customer_id LEFT JOIN stock_gts2.t_stock_currency c ON a.asset = c.currency_code where time >= 1709222400 and time <= 1711987199 )a; (总结:对比逻辑视图的3条sql和直接用sql查询的3条sql,用sql直接count比逻辑视图慢太多;分页查询逻辑视图慢了一倍;两种方式sum的速度都不理想,应该是我sum的字段是动态根据某列的值做精度控制用的函数没走优化器, doris的truncate函数的第二个参数必须是常量,所以只找到greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change这种方式动态处理精度) ----------------------------------------------- 第3个问题.之前在2.1版本测试order by time比order by global_id慢太多的问题,下面在2.0.7版本测试用视图order by global_id(这个是主键) 要10s,用第二条的sql只需0.45s ;把order by global_id改成order by time,第一条需要8.8s,第二条2.4s;(总结:2.1版本对order by主键好像加速了,order by time还是超慢,逻辑视图的order by 都慢) SELECT * from office.v_report_stock_balance where business='trade' and time >1709222400 order by global_id desc limit 10 select a.time, a.global_id, a.user_id, a.account, a.asset, a.business, greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change, greatest(cast( concat(split_part(a.balance, '.', 1),'.',strleft(split_part(a.balance, '.', 2),c.prec_show)) as decimal(38,20)),0) AS balance, a.detail, greatest(cast( concat(split_part(a.balance_src, '.', 1),'.',strleft(split_part(a.balance_src, '.', 2),c.prec_show)) as decimal(38,20)),0) AS amount_src, a.ctime, a.business_id, b.customer_number, '' as country, b.nationality, '' as account_level_code, '' as customer_group_id, least(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS out_change FROM stock_gts2.stock_balance_history a LEFT JOIN office.t_customer b ON a.user_id = b.gts2_customer_id LEFT JOIN stock_gts2.t_stock_currency c ON a.asset = c.currency_code where business='trade' and time >1709222400 order by global_id desc limit 10; --------------------------------------------第4个问题.下面的sql查询需要4.5s,把as global_id别名去掉就只需要0.4s SELECT a.time, a.global_id as global_id, a.user_id, a.account, a.asset, a.business, greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change, greatest(cast( concat(split_part(a.balance, '.', 1),'.',strleft(split_part(a.balance, '.', 2),c.prec_show)) as decimal(38,20)),0) AS balance, a.detail, greatest(cast( concat(split_part(a.balance_src, '.', 1),'.',strleft(split_part(a.balance_src, '.', 2),c.prec_show)) as decimal(38,20)),0) AS amount_src, a.ctime, a.business_id, b.customer_number, '' as country, b.nationality, '' as account_level_code, '' as customer_group_id, least(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS out_change FROM stock_gts2.stock_balance_history a LEFT JOIN office.t_customer b ON a.user_id = b.gts2_customer_id LEFT JOIN stock_gts2.t_stock_currency c ON a.asset = c.currency_code where time >= 1709222400 and time <= 1711987199 order by global_id desc limit 20 offset 1000 --------------------------------------------- 第5个问题.之前版本2.1有个问题,A join B where 后面跟了b表的条件会超级慢,换成逻辑视图的没试过,建议每种查询都测试一下逻辑视图和直接用sql的表现差异 SELECT a.time, a.global_id, a.user_id, a.account, a.asset, a.business, greatest(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS change, greatest(cast( concat(split_part(a.balance, '.', 1),'.',strleft(split_part(a.balance, '.', 2),c.prec_show)) as decimal(38,20)),0) AS balance, a.detail, greatest(cast( concat(split_part(a.balance_src, '.', 1),'.',strleft(split_part(a.balance_src, '.', 2),c.prec_show)) as decimal(38,20)),0) AS amount_src, a.ctime, a.business_id, b.customer_number, b.nationality, b.account_level_code, least(cast( concat(split_part(a.change, '.', 1),'.',strleft(split_part(a.change, '.', 2),c.prec_show)) as decimal(38,20)),0) AS out_change FROM stock_gts2.stock_balance_history a LEFT JOIN office.t_customer b ON a.user_id = b.gts2_customer_id LEFT JOIN stock_gts2.t_stock_currency c ON a.asset = c.currency_code where business='trade' and account_level_code='MIN' and time >1709222400 order by global_id desc limit 10 -----------------------下面是建表和建视图的语句,stock_balance_history这个表有2000w数据,customer表有几十w数据,t_stock_currency有几百条数据,注意stock_balance_history和t_customer在office这个scheme下,另外两个在stock_gts2下 CREATE TABLE `stock_balance_history` ( `global_id` bigint(20) NOT NULL COMMENT '全局递增id', `dt` date NOT NULL COMMENT '交易日期', `user_id` int(11) NOT NULL COMMENT '用户ID', `asset` varchar(30) NOT NULL COMMENT '资产名称', `time` bigint(20) NOT NULL COMMENT '交易时间', `ctime` bigint(20) NOT NULL COMMENT '写入时间', `account` int(11) NOT NULL COMMENT '用户账户ID', `business` varchar(30) NOT NULL COMMENT '业务简介', `business_id` varchar(33) NOT NULL COMMENT '业务id', `change` DECIMAL(38, 20) NOT NULL COMMENT '资产变更', `balance_src` DECIMAL(38, 20) NOT NULL COMMENT '变更前余额', `balance` DECIMAL(38, 20) NOT NULL COMMENT '变更后余额', `detail` text NOT NULL COMMENT '明细信息', INDEX balance_history_time (`time`) USING INVERTED COMMENT '' ) ENGINE=OLAP UNIQUE KEY(`global_id`, `dt`) COMMENT 'OLAP' PARTITION BY RANGE(`dt`) (PARTITION p20240122 VALUES [('2024-01-22'), ('2024-01-23')), PARTITION p20240123 VALUES [('2024-01-23'), ('2024-01-24')), PARTITION p20240124 VALUES [('2024-01-24'), ('2024-01-25')), PARTITION p20240125 VALUES [('2024-01-25'), ('2024-01-26')), PARTITION p20240126 VALUES [('2024-01-26'), ('2024-01-27')), PARTITION p20240127 VALUES [('2024-01-27'), ('2024-01-28')), PARTITION p20240128 VALUES [('2024-01-28'), ('2024-01-29')), PARTITION p20240129 VALUES [('2024-01-29'), ('2024-01-30')), PARTITION p20240130 VALUES [('2024-01-30'), ('2024-01-31')), PARTITION p20240131 VALUES [('2024-01-31'), ('2024-02-01')), PARTITION p20240201 VALUES [('2024-02-01'), ('2024-02-02')), PARTITION p20240202 VALUES [('2024-02-02'), ('2024-02-03')), PARTITION p20240203 VALUES [('2024-02-03'), ('2024-02-04')), PARTITION p20240204 VALUES [('2024-02-04'), ('2024-02-05')), PARTITION p20240205 VALUES [('2024-02-05'), ('2024-02-06')), PARTITION p20240206 VALUES [('2024-02-06'), ('2024-02-07')), PARTITION p20240207 VALUES [('2024-02-07'), ('2024-02-08')), PARTITION p20240208 VALUES [('2024-02-08'), ('2024-02-09')), PARTITION p20240209 VALUES [('2024-02-09'), ('2024-02-10')), PARTITION p20240210 VALUES [('2024-02-10'), ('2024-02-11')), PARTITION p20240211 VALUES [('2024-02-11'), ('2024-02-12')), PARTITION p20240212 VALUES [('2024-02-12'), ('2024-02-13')), PARTITION p20240213 VALUES [('2024-02-13'), ('2024-02-14')), PARTITION p20240214 VALUES [('2024-02-14'), ('2024-02-15')), PARTITION p20240215 VALUES [('2024-02-15'), ('2024-02-16')), PARTITION p20240216 VALUES [('2024-02-16'), ('2024-02-17')), PARTITION p20240217 VALUES [('2024-02-17'), ('2024-02-18')), PARTITION p20240218 VALUES [('2024-02-18'), ('2024-02-19')), PARTITION p20240219 VALUES [('2024-02-19'), ('2024-02-20')), PARTITION p20240220 VALUES [('2024-02-20'), ('2024-02-21')), PARTITION p20240221 VALUES [('2024-02-21'), ('2024-02-22')), PARTITION p20240222 VALUES [('2024-02-22'), ('2024-02-23')), PARTITION p20240223 VALUES [('2024-02-23'), ('2024-02-24')), PARTITION p20240224 VALUES [('2024-02-24'), ('2024-02-25')), PARTITION p20240225 VALUES [('2024-02-25'), ('2024-02-26')), PARTITION p20240226 VALUES [('2024-02-26'), ('2024-02-27')), PARTITION p20240227 VALUES [('2024-02-27'), ('2024-02-28')), PARTITION p20240228 VALUES [('2024-02-28'), ('2024-02-29')), PARTITION p20240229 VALUES [('2024-02-29'), ('2024-03-01')), PARTITION p20240301 VALUES [('2024-03-01'), ('2024-03-02')), PARTITION p20240302 VALUES [('2024-03-02'), ('2024-03-03')), PARTITION p20240303 VALUES [('2024-03-03'), ('2024-03-04')), PARTITION p20240304 VALUES [('2024-03-04'), ('2024-03-05')), PARTITION p20240305 VALUES [('2024-03-05'), ('2024-03-06')), PARTITION p20240306 VALUES [('2024-03-06'), ('2024-03-07')), PARTITION p20240307 VALUES [('2024-03-07'), ('2024-03-08')), PARTITION p20240308 VALUES [('2024-03-08'), ('2024-03-09')), PARTITION p20240309 VALUES [('2024-03-09'), ('2024-03-10')), PARTITION p20240310 VALUES [('2024-03-10'), ('2024-03-11')), PARTITION p20240311 VALUES [('2024-03-11'), ('2024-03-12')), PARTITION p20240312 VALUES [('2024-03-12'), ('2024-03-13')), PARTITION p20240313 VALUES [('2024-03-13'), ('2024-03-14')), PARTITION p20240314 VALUES [('2024-03-14'), ('2024-03-15')), PARTITION p20240315 VALUES [('2024-03-15'), ('2024-03-16')), PARTITION p20240316 VALUES [('2024-03-16'), ('2024-03-17')), PARTITION p20240317 VALUES [('2024-03-17'), ('2024-03-18')), PARTITION p20240318 VALUES [('2024-03-18'), ('2024-03-19')), PARTITION p20240319 VALUES [('2024-03-19'), ('2024-03-20')), PARTITION p20240320 VALUES [('2024-03-20'), ('2024-03-21')), PARTITION p20240321 VALUES [('2024-03-21'), ('2024-03-22')), PARTITION p20240322 VALUES [('2024-03-22'), ('2024-03-23')), PARTITION p20240323 VALUES [('2024-03-23'), ('2024-03-24')), PARTITION p20240324 VALUES [('2024-03-24'), ('2024-03-25')), PARTITION p20240325 VALUES [('2024-03-25'), ('2024-03-26')), PARTITION p20240326 VALUES [('2024-03-26'), ('2024-03-27')), PARTITION p20240327 VALUES [('2024-03-27'), ('2024-03-28')), PARTITION p20240328 VALUES [('2024-03-28'), ('2024-03-29')), PARTITION p20240329 VALUES [('2024-03-29'), ('2024-03-30')), PARTITION p20240330 VALUES [('2024-03-30'), ('2024-03-31')), PARTITION p20240331 VALUES [('2024-03-31'), ('2024-04-01')), PARTITION p20240401 VALUES [('2024-04-01'), ('2024-04-02')), PARTITION p20240402 VALUES [('2024-04-02'), ('2024-04-03')), PARTITION p20240403 VALUES [('2024-04-03'), ('2024-04-04')), PARTITION p20240404 VALUES [('2024-04-04'), ('2024-04-05')), PARTITION p20240405 VALUES [('2024-04-05'), ('2024-04-06')), PARTITION p20240406 VALUES [('2024-04-06'), ('2024-04-07')), PARTITION p20240407 VALUES [('2024-04-07'), ('2024-04-08')), PARTITION p20240408 VALUES [('2024-04-08'), ('2024-04-09'))) DISTRIBUTED BY HASH(`global_id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "is_being_synced" = "false", "dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "DAY", "dynamic_partition.time_zone" = "Asia/Shanghai", "dynamic_partition.start" = "-2147483648", "dynamic_partition.end" = "7", "dynamic_partition.prefix" = "p", "dynamic_partition.replication_allocation" = "tag.location.default: 3", "dynamic_partition.buckets" = "10", "dynamic_partition.create_history_partition" = "false", "dynamic_partition.history_partition_num" = "-1", "dynamic_partition.hot_partition_num" = "0", "dynamic_partition.reserved_history_periods" = "NULL", "dynamic_partition.storage_policy" = "", "storage_format" = "V2", "enable_unique_key_merge_on_write" = "true", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); CREATE TABLE `t_stock_currency` ( `id` bigint(20) NOT NULL, `currency_code` varchar(50) NOT NULL, `prec_save` int(11) NULL, `prec_show` int(11) NULL, `deleted` boolean NULL, `create_user` varchar(50) NULL, `create_ip` varchar(50) NULL, `create_date` datetime NULL, `update_user` varchar(50) NULL, `update_ip` varchar(50) NULL, `update_date` datetime NULL, `version_no` int(11) NULL ) ENGINE=OLAP UNIQUE KEY(`id`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); CREATE TABLE `t_customer` ( `customer_number` bigint(20) NOT NULL COMMENT 'customer number', `gts2_customer_id` bigint(20) NULL, `nationality` varchar(30) NULL, `mobile_phone_prefix` varchar(10) NULL, `mobile_phone` varchar(50) NULL, `email` varchar(50) NULL, `first_trade_date` datetime NULL, `first_deposit_date` datetime NULL, `account_level_code` varchar(20) NULL, `max_account_level_code` varchar(20) NULL, `account_status_code` varchar(50) NULL, `kyc_update_time` datetime NULL, `create_time` datetime NULL ) ENGINE=OLAP UNIQUE KEY(`customer_number`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`customer_number`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); CREATE VIEW `v_report_stock_balance` COMMENT 'VIEW' AS SELECT `a`.`time` AS `time`, `a`.`global_id` AS `global_id`, `a`.`user_id` AS `user_id`, `a`.`account` AS `account`, `a`.`asset` AS `asset`, `a`.`business` AS `business`, greatest(CAST(concat(split_part(`a`.`change`, '.', 1), '.', strleft(split_part(`a`.`change`, '.', 2), `c`.`prec_show`)) AS decimalv3(38, 20)), 0) AS `change`, greatest(CAST(concat(split_part(`a`.`balance`, '.', 1), '.', strleft(split_part(`a`.`balance`, '.', 2), `c`.`prec_show`)) AS decimalv3(38, 20)), 0) AS `balance`, `a`.`detail` AS `detail`, greatest(CAST(concat(split_part(`a`.`balance_src`, '.', 1), '.', strleft(split_part(`a`.`balance_src`, '.', 2), `c`.`prec_show`)) AS decimalv3(38, 20)), 0) AS `amount_src`, `a`.`ctime` AS `ctime`, `a`.`business_id` AS `business_id`, `b`.`customer_number` AS `customer_number`, '' AS `country`, `b`.`nationality` AS `nationality`, '' AS `account_level_code`, '' AS `customer_group_id`, least(CAST(concat(split_part(`a`.`cha nge`, '.', 1), '.', strleft(split_part(`a`.`change`, '.', 2), `c`.`prec_show`)) AS decimalv3(38, 20)), 0) AS `out_change` FROM `default_cluster:stock_gts2`.`stock_balance_history` a LEFT OUTER JOIN `default_cluster:office`.`t_customer` b ON CAST(`a`.`user_id` AS BIGINT) = `b`.`gts2_customer_id` LEFT OUTER JOIN `default_cluster:stock_gts2`.`t_stock_currency` c ON `a`.`asset` = `c`.`currency_code`; ### What You Expected? 在join的情况下,逻辑视图和直接用sql查询都能在合理的返回范围时间内,目前我分页查询需要查count,查明细,然后sum三条sql,加起来的时间至少需要十几秒 ### How to Reproduce? 我文档包括了建表语句和视图语句,可以造点数据就能测试 ### Anything Else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org