chenlinzhong opened a new pull request, #11761: URL: https://github.com/apache/doris/pull/11761
# Proposed changes Issue Number: close #xxx ## Problem summary ``` //create one table such as table2 CREATE TABLE `table2` ( `event_day` date NULL, `siteid` int(11) NULL DEFAULT "10", `citycode` smallint(6) NULL, `visitinfo` varchar(1024) NULL DEFAULT "", `pv` varchar(1024) REPLACE NULL DEFAULT "0" ) ENGINE=OLAP AGGREGATE KEY(`event_day`, `siteid`, `citycode`, `visitinfo`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`siteid`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" ) //import some data MySQL [test_db]> select * from table2; +------------+--------+----------+------------------------------------+------+ | event_day | siteid | citycode | visitinfo | pv | +------------+--------+----------+------------------------------------+------+ | 2017-07-03 | 8 | 12 | {"ip":"192.168.0.5","source":"pc"} | 81 | | 2017-07-03 | 37 | 12 | {"ip":"192.168.0.3","source":"pc"} | 81 | | 2017-07-03 | 67 | 16 | {"ip":"192.168.0.2","source":"pc"} | 79 | | 2017-07-03 | 101 | 11 | {"ip":"192.168.0.5","source":"pc"} | 65 | | 2017-07-03 | 32 | 15 | {"ip":"192.168.0.1","source":"pc"} | 188 | | 2017-07-03 | 103 | 12 | {"ip":"192.168.0.5","source":"pc"} | 123 | | 2017-07-03 | 104 | 16 | {"ip":"192.168.0.5","source":"pc"} | 79 | | 2017-07-03 | 3 | 12 | {"ip":"192.168.0.3","source":"pc"} | 123 | | 2017-07-03 | 3 | 15 | {"ip":"192.168.0.2","source":"pc"} | 188 | | 2017-07-03 | 13 | 11 | {"ip":"192.168.0.1","source":"pc"} | 65 | | 2017-07-03 | 53 | 12 | {"ip":"192.168.0.2","source":"pc"} | 123 | | 2017-07-03 | 1 | 11 | {"ip":"192.168.0.1","source":"pc"} | 65 | | 2017-07-03 | 7 | 16 | {"ip":"192.168.0.4","source":"pc"} | 79 | | 2017-07-03 | 102 | 15 | {"ip":"192.168.0.5","source":"pc"} | 188 | | 2017-07-03 | 105 | 12 | {"ip":"192.168.0.5","source":"pc"} | 81 | +------------+--------+----------+------------------------------------+------+ ``` ### 1. find most visit top 3 ip not implement this demo in c++, if you are interested in this demo you can reference python/java implement ``` MySQL [test_db]> CREATE AGGREGATE FUNCTION rpc_count_visit_info(varchar(1024)) RETURNS varchar(1024) PROPERTIES ( "TYPE"="RPC", "OBJECT_FILE"="127.0.0.1:9000", "update_fn"="rpc_count_visit_info_update", "merge_fn"="rpc_count_visit_info_merge", "finalize_fn"="rpc_count_visit_info_finalize" ); MySQL [test_db]> select rpc_count_visit_info(visitinfo) from table2; +--------------------------------------------+ | rpc_count_visit_info(`visitinfo`) | +--------------------------------------------+ | 192.168.0.5:6 192.168.0.2:3 192.168.0.1:3 | +--------------------------------------------+ 1 row in set (0.036 sec) MySQL [test_db]> select citycode, rpc_count_visit_info(visitinfo) from table2 group by citycode; +----------+--------------------------------------------+ | citycode | rpc_count_visit_info(`visitinfo`) | +----------+--------------------------------------------+ | 15 | 192.168.0.2:1 192.168.0.1:1 192.168.0.5:1 | | 11 | 192.168.0.1:2 192.168.0.5:1 | | 12 | 192.168.0.5:3 192.168.0.3:2 192.168.0.2:1 | | 16 | 192.168.0.2:1 192.168.0.4:1 192.168.0.5:1 | +----------+--------------------------------------------+ 4 rows in set (0.050 sec) ``` ### 2. sum pv ``` CREATE AGGREGATE FUNCTION rpc_sum(bigint) RETURNS bigint PROPERTIES ( "TYPE"="RPC", "OBJECT_FILE"="127.0.0.1:9700", "update_fn"="rpc_sum_update", "merge_fn"="rpc_sum_merge", "finalize_fn"="rpc_sum_finalize" ); MySQL [test_db]> select citycode, rpc_sum(pv) from table2 group by citycode; +----------+---------------+ | citycode | rpc_sum(`pv`) | +----------+---------------+ | 15 | 564 | | 11 | 195 | | 12 | 612 | | 16 | 237 | +----------+---------------+ 4 rows in set (0.067 sec) MySQL [test_db]> select rpc_sum(pv) from table2; +---------------+ | rpc_sum(`pv`) | +---------------+ | 1608 | +---------------+ 1 row in set (0.030 sec) ``` ### 3. avg pv ``` CREATE AGGREGATE FUNCTION rpc_avg(int) RETURNS double PROPERTIES ( "TYPE"="RPC", "OBJECT_FILE"="127.0.0.1:9000", "update_fn"="rpc_avg_update", "merge_fn"="rpc_avg_merge", "finalize_fn"="rpc_avg_finalize" ); MySQL [test_db]> select citycode, rpc_avg(pv) from table2 group by citycode; +----------+---------------+ | citycode | rpc_avg(`pv`) | +----------+---------------+ | 15 | 188 | | 11 | 65 | | 12 | 102 | | 16 | 79 | +----------+---------------+ 4 rows in set (0.039 sec) MySQL [test_db]> select rpc_avg(pv) from table2; +---------------+ | rpc_avg(`pv`) | +---------------+ | 107.2 | +---------------+ 1 row in set (0.028 sec) ``` ## Checklist(Required) 1. Does it affect the original behavior: - [ ] Yes - [ ] No - [ ] I don't know 2. Has unit tests been added: - [ ] Yes - [ ] No - [ ] No Need 3. Has document been added or modified: - [ ] Yes - [ ] No - [ ] No Need 4. Does it need to update dependencies: - [ ] Yes - [ ] No 5. Are there any changes that cannot be rolled back: - [ ] Yes (If Yes, please explain WHY) - [ ] No ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org