spaces-X opened a new issue #7782:
URL: https://github.com/apache/incubator-doris/issues/7782


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   In current doris-0.15 or elder version, the quantile value is calculated by 
the detailed data from duplicated module, whose latency is unfriendly under the 
large scale of data.
   
   Proposed to enable `quantile pre-aggregation` to reduce query latency, 
already implemented in ClickHouse as follows.
   
   ```
   SELECT quantileState(number) AS st  -- st is a quantileState generated by 0~9
   FROM numbers(10)
   
   Query id: cbde1c1b-e20a-430d-b34a-67c9833be6af
   
   ┌─st─────────────────────────────┐
   │
   6364136223846793005 0 123459     │
   └────────────────────────────────┘
   
   1 rows in set. Elapsed: 0.002 sec.
   
   -------------------------------------------------------
   
   SELECT quantileMerge(0.8)(st) -- use quantileMerge function to calculate 
quantile
   FROM
   (
       SELECT quantileState(number) AS st
       FROM numbers(10)
   )
   
   Query id: 1c25beb5-f6c5-4f32-a6ce-7bbd6d0429ef
   
   ┌─quantileMerge(0.8)(st)─┐
   │                    7.2 │
   └────────────────────────┘
   ```
   
   
   Referring to the existing **HLL and bitmap** implementations, the 
**intermediate state** of the quantile function can be **stored** by TDigest 
serialization in stream-load step. 
   
   The changes are roughly as follows.
   
   
   1. A new column named `quantilestate` and corresponding agg function 
`quantile_union` , `quantile_cal` are supposed to added.
      - quantile_union: add a value into quantilestate
      - quantile_cal(float: percentage):  calculate the quantile of percentage  
by quantilestate
      - to_quantile(float: value): transfer value to quantilestate
   
   2. Support for `QuantileState` in query and load step.
      
   3. Refactor  `PercentileApproxState` and `TDigest`
   
   
   
   
   
   
   
   
   
   
   ### Use case
   
   create table sql like:
   ```
   CREATE TABLE `QuantileState_Test` (
     `keys` bigint(20) NULL COMMENT "keys",
     `quantile_value` quantilestate quantile_union NOT NULL COMMENT "qualite 
calue"
   ) ENGINE=OLAP
   AGGREGATE KEY(`brand_id`, `dt`, `poi_type`)
   COMMENT "bitmap load 测试#OWNER#lihuigang"
   PARTITION BY RANGE(`dt`) (
      xxx
   )
   DISTRIBUTED BY HASH(`keys`) BUCKETS 3
   PROPERTIES (
      xxx
   );
   ```
   
   stream load cmd like:
   
   ```
   curl --location-trusted -u root -H "columns: k1, k2, 
v1=to_quantilestate(v1)" -T testData 
http://host:port/api/testDb/testTbl/_stream_load
   
   ```
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to