spaces-X opened a new issue #7782: URL: https://github.com/apache/incubator-doris/issues/7782
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description In current doris-0.15 or elder version, the quantile value is calculated by the detailed data from duplicated module, whose latency is unfriendly under the large scale of data. Proposed to enable `quantile pre-aggregation` to reduce query latency, already implemented in ClickHouse as follows. ``` SELECT quantileState(number) AS st -- st is a quantileState generated by 0~9 FROM numbers(10) Query id: cbde1c1b-e20a-430d-b34a-67c9833be6af ┌─st─────────────────────────────┐ │ 6364136223846793005 0 123459 │ └────────────────────────────────┘ 1 rows in set. Elapsed: 0.002 sec. ------------------------------------------------------- SELECT quantileMerge(0.8)(st) -- use quantileMerge function to calculate quantile FROM ( SELECT quantileState(number) AS st FROM numbers(10) ) Query id: 1c25beb5-f6c5-4f32-a6ce-7bbd6d0429ef ┌─quantileMerge(0.8)(st)─┐ │ 7.2 │ └────────────────────────┘ ``` Referring to the existing **HLL and bitmap** implementations, the **intermediate state** of the quantile function can be **stored** by TDigest serialization in stream-load step. The changes are roughly as follows. 1. A new column named `quantilestate` and corresponding agg function `quantile_union` , `quantile_cal` are supposed to added. - quantile_union: add a value into quantilestate - quantile_cal(float: percentage): calculate the quantile of percentage by quantilestate - to_quantile(float: value): transfer value to quantilestate 2. Support for `QuantileState` in query and load step. 3. Refactor `PercentileApproxState` and `TDigest` ### Use case create table sql like: ``` CREATE TABLE `QuantileState_Test` ( `keys` bigint(20) NULL COMMENT "keys", `quantile_value` quantilestate quantile_union NOT NULL COMMENT "qualite calue" ) ENGINE=OLAP AGGREGATE KEY(`brand_id`, `dt`, `poi_type`) COMMENT "bitmap load 测试#OWNER#lihuigang" PARTITION BY RANGE(`dt`) ( xxx ) DISTRIBUTED BY HASH(`keys`) BUCKETS 3 PROPERTIES ( xxx ); ``` stream load cmd like: ``` curl --location-trusted -u root -H "columns: k1, k2, v1=to_quantilestate(v1)" -T testData http://host:port/api/testDb/testTbl/_stream_load ``` ### Related issues _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org