doris python支持

2022-11-24 Thread 王云飞
你好!请问doris是否有官方的python库支持?谢谢


Re:关于Apache Doris物化视图和副本的疑问

2022-11-24 Thread Mingyu Chen
Currently, the materialized view in Doris is just same as Rollup, which is 
based on a singe table (aka base table).
And the partition and bucket of materialized view(or rollup) must be same as 
base table.
So we can not aggregate data from diffrent partition or bucket in a rollup. 
This is a disadvantage of current implementation, we are developing a new 
materialized view feature which can
set different partition or buckets for a materialized view.







--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morning...@apache.org





在 2022-11-22 17:04:13,"fl"  写道:
>Hi, Apache Doris开发组:
>
>
>在使用“物化视图”特性时,想到了下述问题,不明白其相关原理,故请教各位。
>
>
>对某表建物化视图时,该表设置了一定数量的副本数,那么物化视图聚合计算时,对这些分布在不同节点的tablet副本是如何处理呢?
>
>
>即某个tablet有多个副本,按照partition维度聚合,对当前节点的该partition下的所有tablet聚合(比如sum)后,那么单个节点的partition
> sum() 结果肯定是小于正确值的(部分tablet在其他节点),而多个节点的sum() 
>汇总结果又是大于正确值的(必定存在相同tablet分布在不同节点,被重复聚合)


Re:doris python支持

2022-11-24 Thread Mingyu Chen
You can use any standard MySQL python lib to connect to Doris.
Because Doris is compatible with MySQL connect protocol.



--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morning...@apache.org





在 2022-11-24 19:35:13,"王云飞"  写道:
>你好!请问doris是否有官方的python库支持?谢谢


Re:[Discuss][DSIP] Create Resource Queue in Doris to Support Asynchronous Job Submission

2022-11-24 Thread Mingyu Chen
Hi GaoXin:
I think you are talking 2 things, although they are related.
One is the "Resource Queue", which is used to define different resource group 
for different workload.
The other is "Submit SQL Job", which is used to submit arbitrary SQL (select, 
insert, etc.) asynchronously.
And the "SQL job" can be submitted to the "Resource Queue".


I am more interested in "Submit SQL Job", I would like to make Doris 
"All-in-SQL", which means all kinds of job can be
decribed by a SQL.
For example, a "Broker Load" statement can be a "submit sql job(insert into 
dest_table select * from hdfs(file1));",
or a "Export" job can be a "submit sql job(insert into s3(bucket1) select * 
from source_table".


Looking forward to this feature, and could you please write a DSIP for this 
feature?
You can provide your wiki account, and I can open the write permission for you.







--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morning...@apache.org





At 2022-11-23 20:54:41, "高鑫"  wrote:
>Hi,
>
>
>Doris users feedback that: When the database load reaches a certain level, 
>each query will compete for CPU resources and memory resources, resulting in 
>low overall query performance, so we need the resource queue to limit the 
>concurrency of large queries and ensure the stable performance of small 
>queries. Support for resource queue is motivated by the following points:
>
>Large queries or jobs preempt cluster resources, resulting in small queries 
>that cannot be completed quickly;
>
>Unable to limit the submission of large queries. A lot of parallel large 
>queries lead to problems such as BE OOM or full preemption of cluster 
>resources.
>Resource Queue: User can specify the number of concurrent queries that the 
>database can run and the number of queries queued according to your own 
>business.
>This can ensure that there are expected system resources when executing the 
>query, so as to obtain the expected query performance.
>
>
>
>
>Related Work
>
>Creating resource queues for queries is common in various database products. 
>Aliyun AnalyticDB specifies the number of concurrent queries that the database 
>can run,
>
>the memory size that each query can use, and the CPU resources that can be 
>used by creating a resource queue; 
>
>In the process of using the cloud data warehouse PostgreSQL, a single complex 
>query may consume too many resources and affect other users' queries or 
>calculations.
>
>When it is necessary to limit the consumption of system resources for a single 
>user or query statement, Tencent Cloud uses resource queues to limit.
>
>
>
>Create Resource Queue
>The resource queue stores two types of information:
>
>Queue configuration: describes the resource limits available for this queue, 
>such as: Concurrency, CPU, memory, scan rows, etc.
>
>Matching policy: After a query (such as select/insert) is submitted, a 
>matching queue will be matched according to the job information. Matching 
>rules can be username, IP, database name and table name.
>
>
>SQL syntax:// create resource queue
>
>CREATE RESOURCE QUEUE queue_name [WITH RESOURCE ( "max_concurrency" = "1", 
>// Limit the number of queries running simultaneously in the queue 
>"max_queue_size" = "10" // Limit the number of queries queued in the queue )] 
>[WITH MATCHING POLICY ( "user" = "rd_group*", // Match the prefix of user 
>name "ip" = "192.10.1.*" // Match the prefix of IP )]; // drop resource 
>queue DROP RESOURCE QUEUE queue_name; // show resource queues SHOW RESOURCE 
>QUEUES; // show specified queue: queueId, type, pendingNum, runningNum, 
>queueConfig, matchingPolicy SHOW RESOURCE QUEUE queue_name;
>Submit Asynchronous Job:
>SUBMIT SQL JOB [WITH LABEL label_name]( sql_stmt ) [PROPERTIES( 
>"wait_timeout_ms" = "-1", // Max time of waiting in the queue, -1 means 
>waiting all the time "query_timeout_ms" = "-1" // Max time of running, -1 
>means consistent with the system )]


Re:reply:[Discuss][DSIP] Create Resource Queue in Doris to Support Asynchronous Job Submission

2022-11-24 Thread Mingyu Chen
Done


https://cwiki.apache.org/confluence/display/DORIS/DSIP-025%3A+Support+Resource+Queue
https://cwiki.apache.org/confluence/display/DORIS/DSIP-026%3A+Support+Submit+SQL+Job







--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morning...@apache.org




在 2022-11-24 21:20:23,"高鑫"  写道:

Hi Mingyu Chen:


OK, I will describe these two features in detail in DISP: "Resource Queue" and 
"Submit SQL Job".


my wiki account is helloxiyue


Best Regards
Gaoxin




-- 原始邮件 --
发件人: "dev" ;
发送时间: 2022年11月24日(星期四) 晚上8:54
收件人: "dev";
主题: Re:[Discuss][DSIP] Create Resource Queue in Doris to Support Asynchronous 
Job Submission


Hi GaoXin:
I think you are talking 2 things, although they are related.
One is the "Resource Queue", which is used to define different resource group 
for different workload.
The other is "Submit SQL Job", which is used to submit arbitrary SQL (select, 
insert, etc.) asynchronously.
And the "SQL job" can be submitted to the "Resource Queue".


I am more interested in "Submit SQL Job", I would like to make Doris 
"All-in-SQL", which means all kinds of job can be
decribed by a SQL.
For example, a "Broker Load" statement can be a "submit sql job(insert into 
dest_table select * from hdfs(file1));",
or a "Export" job can be a "submit sql job(insert into s3(bucket1) select * 
from source_table".


Looking forward to this feature, and could you please write a DSIP for this 
feature?
You can provide your wiki account, and I can open the write permission for you.







--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morning...@apache.org





At 2022-11-23 20:54:41, "高鑫"  wrote:
>Hi,
>
>
>Doris users feedback that: When the database load reaches a certain level, 
>each query will compete for CPU resources and memory resources, resulting in 
>low overall query performance, so we need the resource queue to limit the 
>concurrency of large queries and ensure the stable performance of small 
>queries. Support for resource queue is motivated by the following points:
>
>Large queries or jobs preempt cluster resources, resulting in small queries 
>that cannot be completed quickly;
>
>Unable to limit the submission of large queries. A lot of parallel large 
>queries lead to problems such as BE OOM or full preemption of cluster 
>resources.
>Resource Queue: User can specify the number of concurrent queries that the 
>database can run and the number of queries queued according to your own 
>business.
>This can ensure that there are expected system resources when executing the 
>query, so as to obtain the expected query performance.
>
>
>
>
>Related Work
>
>Creating resource queues for queries is common in various database products. 
>Aliyun AnalyticDB specifies the number of concurrent queries that the database 
>can run,
>
>the memory size that each query can use, and the CPU resources that can be 
>used by creating a resource queue;
>
>In the process of using the cloud data warehouse PostgreSQL, a single complex 
>query may consume too many resources and affect other users' queries or 
>calculations.
>
>When it is necessary to limit the consumption of system resources for a single 
>user or query statement, Tencent Cloud uses resource queues to limit.
>
>
>
>Create Resource Queue
>The resource queue stores two types of information:
>
>Queue configuration: describes the resource limits available for this queue, 
>such as: Concurrency, CPU, memory, scan rows, etc.
>
>Matching policy: After a query (such as select/insert) is submitted, a 
>matching queue will be matched according to the job information. Matching 
>rules can be username, IP, database name and table name.
>
>
>SQL syntax:// create resource queue
>
>CREATE RESOURCE QUEUE queue_name [WITH RESOURCE ( "max_concurrency" = "1", 
>// Limit the number of queries running simultaneously in the queue 
>"max_queue_size" = "10" // Limit the number of queries queued in the queue )] 
>[WITH MATCHING POLICY ( "user" = "rd_group*", // Match the prefix of user 
>name "ip" = "192.10.1.*" // Match the prefix of IP )]; // drop resource 
>queue DROP RESOURCE QUEUE queue_name; // show resource queues SHOW RESOURCE 
>QUEUES; // show specified queue: queueId, type, pendingNum, runningNum, 
>queueConfig, matchingPolicy SHOW RESOURCE QUEUE queue_name;
>Submit Asynchronous Job:
>SUBMIT SQL JOB [WITH LABEL label_name]( sql_stmt ) [PROPERTIES( 
>"wait_timeout_ms" = "-1", // Max time of waiting in the queue, -1 means 
>waiting all the time "query_timeout_ms" = "-1" // Max time of running, -1 
>means consistent with the system )]


Re:[DISCUSS] Flink Doris Connector 1.3.0 Release

2022-11-24 Thread Mingyu Chen
Good to see that.
I haved pined the release notes of 1.1.1, 1.2.1 and 1.3.0[1]


[1] https://github.com/apache/doris-flink-connector/issues




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morning...@apache.org





At 2022-11-22 16:38:04, "wudi" <676366...@qq.com.INVALID> wrote:
>Dear, all
>
>We are ready to release a version of flink-doris-connector that supports flink 
>1.16, and the branch release-1.3.0[1] has been made, release note is [2].
>
>
>At the same time, for Flink 1.14 and Flink 1.15, the corresponding 1.1.1[3] 
>and 1.2.1[4] versions are also release. release note is [5][6]
>
>[1] https://github.com/apache/doris-flink-connector/tree/release-1.3.0
>[2] https://github.com/apache/doris-flink-connector/issues/85
>[3] https://github.com/apache/doris-flink-connector/tree/release-1.1.1
>[4] https://github.com/apache/doris-flink-connector/tree/release-1.2.1
>[5] https://github.com/apache/doris-flink-connector/issues/84
>[6] https://github.com/apache/doris-flink-connector/issues/83
>-
>To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
>For additional commands, e-mail: dev-h...@doris.apache.org