[GitHub] [incubator-doris] yangzhg merged pull request #4234: Update support batch delete storage design document

2020-08-18 Thread GitBox


yangzhg merged pull request #4234:
URL: https://github.com/apache/incubator-doris/pull/4234


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: Update support batch delete storage design document (#4234)

2020-08-18 Thread yangzhg
This is an automated email from the ASF dual-hosted git repository.

yangzhg pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 8a3eaee  Update support batch delete storage design document  (#4234)
8a3eaee is described below

commit 8a3eaeecf12628d72342ed9d6e62da90092de7e6
Author: ZhangYu0123 <67053339+zhangyu0...@users.noreply.github.com>
AuthorDate: Tue Aug 18 15:37:14 2020 +0800

Update support batch delete storage design document  (#4234)

* Update delete index design document
---
 docs/en/internal/doris_storage_optimization.md| 78 ---
 docs/zh-CN/internal/doris_storage_optimization.md | 38 +--
 2 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/docs/en/internal/doris_storage_optimization.md 
b/docs/en/internal/doris_storage_optimization.md
index 529b85d..6ceccad 100644
--- a/docs/en/internal/doris_storage_optimization.md
+++ b/docs/en/internal/doris_storage_optimization.md
@@ -36,7 +36,7 @@ Documents include:
 - The file starts with an 8-byte magic code to identify the file format and 
version
 - Data Region: Used to store data information for each column, where the data 
is loaded on demand by pages.
 - Index Region: Doris stores the index data of each column in Index Region, 
where the data is loaded according to column granularity, so the data 
information of the following column is stored separately.
-- Footer信息
+- Footer
- FileFooterPB: Metadata Information for Definition Files
- Chesum of 4 bytes of footer Pb content
- Four bytes FileFooterPB message length for reading FileFooterPB
@@ -116,27 +116,29 @@ We generate a sparse index of short key every N rows 
(configurable) with the con
 The format design supports the subsequent expansion of other index 
information, such as bitmap index, spatial index, etc. It only needs to write 
the required data to the existing column data, and add the corresponding 
metadata fields to FileFooterPB.
 
 ### Metadata Definition ###
-FileFooterPB is defined as:
+SegmentFooterPB is defined as:
 
 ```
 message ColumnPB {
-optional uint32 column_id = 1; // 这里使用column id,不使用column name是因为计划支持修改列名
-optional string type = 2; // 列类型
-optional string aggregation = 3; // 是否聚合
-optional uint32 length = 4; // 长度
-optional bool is_key = 5; // 是否是主键列
-optional string default_value = 6; // 默认值
-optional uint32 precision = 9 [default = 27]; // 精度
-optional uint32 frac = 10 [default = 9];
-optional bool is_nullable = 11 [default=false]; // 是否有null
-optional bool is_bf_column = 15 [default=false]; // 是否有bf词典
-   optional bool is_bitmap_column = 16 [default=false]; // 是否有bitmap索引
+required int32 unique_id = 1;   // The column id is used here, and the 
column name is not used
+optional string name = 2;   // Column name,  when name equals 
__DORIS_DELETE_SIGN__, this column is a hidden delete column
+required string type = 3;   // Column type
+optional bool is_key = 4;   // Whether column is a primary key column
+optional string aggregation = 5;// Aggregate type
+optional bool is_nullable = 6;  // Whether column is allowed to assgin 
null
+optional bytes default_value = 7;   // Defalut value
+optional int32 precision = 8;   // Precision of column
+optional int32 frac = 9;
+optional int32 length = 10; // Length of column
+optional int32 index_length = 11;   // Length of column index
+optional bool is_bf_column = 12;// Whether column has bloom filter 
index
+optional bool has_bitmap_index = 15 [default=false];  // Whether column 
has bitmap index
 }
 
-// page偏移
+// page offset
 message PagePointerPB {
-   required uint64 offset; // page在文件中的偏移
-   required uint32 length; // page的大小
+   required uint64 offset; // offset of page in segment file
+   required uint32 length; // length of page
 }
 
 message MetadataPairPB {
@@ -145,36 +147,36 @@ message MetadataPairPB {
 }
 
 message ColumnMetaPB {
-   optional ColumnMessage encoding; // 编码方式
+   optional ColumnMessage encoding; // Encoding of column
 
-   optional PagePointerPB dict_page // 词典page
-   repeated PagePointerPB bloom_filter_pages; // bloom filter词典信息
-   optional PagePointerPB ordinal_index_page; // 行号索引数据
-   optional PagePointerPB page_zone_map_page; // page级别统计信息索引数据
+   optional PagePointerPB dict_page // Dictionary page
+   repeated PagePointerPB bloom_filter_pages; // Bloom filter pages
+   optional PagePointerPB ordinal_index_page; // Ordinal index page
+   optional PagePointerPB page_zone_map_page; // Page level of statistics 
index data
 
-   optional PagePointerPB bitmap_index_page; // bitmap索引数据
+   optional PagePointerPB bitmap_index_page; // Bitmap index page
 
-   optional uint64 data_footprint; // 列中索引的大小
-   opt

[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471988812



##
File path: be/src/common/config.h
##
@@ -265,6 +265,27 @@ namespace config {
 CONF_mInt64(base_compaction_interval_seconds_since_last_operation, 
"86400");
 CONF_mInt32(base_compaction_write_mbytes_per_sec, "5");
 
+// config the cumulative compaction policy
+// Valid configs: num_base, size_based
+// num_based policy, the original version of cumulative compaction, 
cumulative version compaction once.
+// size_based policy, a optimization version of cumulative compaction, 
targeting the use cases requiring 
+// lower write amplification, trading off read amplification and space 
amplification.
+CONF_String(cumulative_compaction_policy, "num_based");
+
+// In size_based policy, output rowset of cumulative compaction total disk 
size exceed this config size, 
+// this rowset will be given to base compaction, unit is m byte.
+CONF_mInt64(cumulative_compaction_size_based_promotion_size_mbytes, 
"1024");
+// In size_based policy, output rowset of cumulative compaction total disk 
size exceed this config ratio of
+// base rowset's total disk size, this rowset will be given to base 
compaction. The value must be between 
+// 0 and 1.
+CONF_mDouble(cumulative_compaction_size_based_promotion_ratio, "0.05");
+// In size_based policy, the smallest size of rowset promotion. When the 
rowset is less than this config, this 
+// rowset will be not given to base compaction. The unit is m byte.
+CONF_mInt64(cumulative_compaction_size_based_promotion_min_size_mbytes, 
"64");
+// The lower bound size to do cumulative compaction. When total disk size 
of candidate rowsets is less than 
+// this size, size_based policy also does cumulative compaction. The unit 
is m byte.
+
CONF_mInt64(cumulative_compaction_size_based_compaction_lower_bound_size_mbytes,
 "64");

Review comment:
   those config was too can we make it more shorter





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471990440



##
File path: be/src/olap/cumulative_compaction_policy.h
##
@@ -0,0 +1,263 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+
+#include 
+
+#include "olap/utils.h"
+#include "olap/tablet.h"
+#include "olap/tablet_meta.h"
+#include "olap/rowset/rowset_meta.h"
+#include "olap/rowset/rowset.h"
+
+namespace doris {
+
+class Tablet;
+
+/// This CompactionPolicyType enum is used to represent the type of compaction 
policy.
+/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and 
CUMULATIVE_SIZE_BASED_POLICY.
+/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by 
num based policy.
+/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented 
by size_based policy.
+enum CompactionPolicyType {

Review comment:
   I think Policy and Type is duplicated in meaning





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471991211



##
File path: be/src/olap/cumulative_compaction_policy.h
##
@@ -0,0 +1,263 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+
+#include 
+
+#include "olap/utils.h"
+#include "olap/tablet.h"
+#include "olap/tablet_meta.h"
+#include "olap/rowset/rowset_meta.h"
+#include "olap/rowset/rowset.h"
+
+namespace doris {
+
+class Tablet;
+
+/// This CompactionPolicyType enum is used to represent the type of compaction 
policy.
+/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and 
CUMULATIVE_SIZE_BASED_POLICY.
+/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by 
num based policy.
+/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented 
by size_based policy.
+enum CompactionPolicyType {
+CUMULATIVE_NUM_BASED_POLICY = 0,

Review comment:
   you can  use `NUM_BASED` directly, `CUMULATIVE` is in the class name, 
`POLICY` is in the enum name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0

2020-08-18 Thread GitBox


EmmyMiao87 edited a comment on issue #4370:
URL: 
https://github.com/apache/incubator-doris/issues/4370#issuecomment-674835876


   # Credits
   
   @ZhangYu0123   
   @wfjcmcb   
   @Fullstop000   
   @sduzh 
   @Stalary   
   @worker24h 
   @chaoyli   
   @vagetablechicken  
   @jmk1011   
   @funyeah   
   @wutiangan 
   @gengjun-git   
   @xinghuayu007  
   @EmmyMiao87
   @songenjie 
   @acelyc111 
   @yangzhg   
   @Seaven
   @hexian55  
   @ChenXiaofei   
   @WingsGo   
   @kangpinghuang 
   @wangbo
   @weizuo93  
   @sdgshawn  
   @skyduy
   @wyb   
   @gaodayue  
   @HappenLee 
   @kangkaisen
   @wuyunfeng 
   @HangyuanLiu   
   @xy720 
   @liutang123
   @caiconghui
   @liyuance  
   @spaces-X  
   @hffariel  
   @decster   
   @blackfox1983  
   @Astralidea
   @morningman
   @hf200012  
   @xbyang18  
   @Youngwb   
   @imay  
   @marising
   @caoyang10



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471992521



##
File path: be/src/olap/cumulative_compaction_policy.h
##
@@ -0,0 +1,263 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+
+#include 
+
+#include "olap/utils.h"
+#include "olap/tablet.h"
+#include "olap/tablet_meta.h"
+#include "olap/rowset/rowset_meta.h"
+#include "olap/rowset/rowset.h"
+
+namespace doris {
+
+class Tablet;
+
+/// This CompactionPolicyType enum is used to represent the type of compaction 
policy.
+/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and 
CUMULATIVE_SIZE_BASED_POLICY.
+/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by 
num based policy.
+/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented 
by size_based policy.
+enum CompactionPolicyType {
+CUMULATIVE_NUM_BASED_POLICY = 0,
+CUMULATIVE_SIZE_BASED_POLICY = 1,
+};
+
+const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED";
+const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED";

Review comment:
   same problem as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471993152



##
File path: be/src/olap/cumulative_compaction_policy.h
##
@@ -0,0 +1,263 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+
+#include 
+
+#include "olap/utils.h"
+#include "olap/tablet.h"
+#include "olap/tablet_meta.h"
+#include "olap/rowset/rowset_meta.h"
+#include "olap/rowset/rowset.h"
+
+namespace doris {
+
+class Tablet;
+
+/// This CompactionPolicyType enum is used to represent the type of compaction 
policy.
+/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and 
CUMULATIVE_SIZE_BASED_POLICY.
+/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by 
num based policy.
+/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented 
by size_based policy.
+enum CompactionPolicyType {
+CUMULATIVE_NUM_BASED_POLICY = 0,
+CUMULATIVE_SIZE_BASED_POLICY = 1,
+};
+
+const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED";
+const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED";
+/// This class CumulativeCompactionPolicy is the base class of cumulative 
compaction policy.

Review comment:
   why  use `///` ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] xy720 opened a new pull request #4383: [SparkLoad]Use the yarn command to get status and kill the application

2020-08-18 Thread GitBox


xy720 opened a new pull request #4383:
URL: https://github.com/apache/incubator-doris/pull/4383


   ## Proposed changes
   
   #4346 #4203 
   This cl will use yarn command as follows to kill or get status of 
application running on YARN.
   
   ```
   yarn --config confdir application <-kill | -status> 
   ```
   
   To do
   1、 Make yarn command executable in spark load.
   2、Write spark resource into config files and update it before running 
command.
   3、Parse the result of executing the command line.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [x] I have create an issue on (Fix #ISSUE), and have described the 
bug/feature there in detail
   - [] Compiling and unit tests pass locally with my changes
   - [] I have added tests that prove my fix is effective or that my feature 
works
   - [] If this change need a document change, I have updated the document
   - [] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
d...@doris.apache.org by explaining why you chose the solution you did and what 
alternatives you considered, etc...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471994484



##
File path: be/src/olap/cumulative_compaction_policy.h
##
@@ -0,0 +1,263 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+
+#include 
+
+#include "olap/utils.h"
+#include "olap/tablet.h"
+#include "olap/tablet_meta.h"
+#include "olap/rowset/rowset_meta.h"
+#include "olap/rowset/rowset.h"
+
+namespace doris {
+
+class Tablet;
+
+/// This CompactionPolicyType enum is used to represent the type of compaction 
policy.
+/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and 
CUMULATIVE_SIZE_BASED_POLICY.
+/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by 
num based policy.
+/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented 
by size_based policy.
+enum CompactionPolicyType {
+CUMULATIVE_NUM_BASED_POLICY = 0,
+CUMULATIVE_SIZE_BASED_POLICY = 1,
+};
+
+const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED";
+const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED";
+/// This class CumulativeCompactionPolicy is the base class of cumulative 
compaction policy.
+/// It defines the policy to do cumulative compaction. It has different 
derived classes, which implements 
+/// concrete cumulative compaction algorithm. The policy is configured by 
conf::cumulative_compaction_policy.
+/// The policy functions is the main steps to do cumulative compaction. For 
example, how to pick candicate 
+/// rowsets from tablet using current policy, how to calculate the cumulative 
point and how to calculate
+/// the tablet cumulative compcation score and so on.
+class CumulativeCompactionPolicy {
+
+public:
+/// Constructor function of CumulativeCompactionPolicy, 
+/// it needs tablet pointer to access tablet method. 
+/// param tablet, the shared pointer of tablet
+CumulativeCompactionPolicy(std::shared_ptr tablet) : 
_tablet(tablet){}
+
+/// Destructor function of CumulativeCompactionPolicy.
+virtual ~CumulativeCompactionPolicy() {}
+
+/// Calculate the cumulative compaction score of the tablet. This function 
uses rowsets meta and current 
+/// cumulative point to calculative the score of tablet. The score depends 
on the concrete algorithm of policy.
+/// In general, the score represents the segments nums to do cumulative 
compaction in total rowsets. The more
+/// score tablet gets, the earlier it can do  cumulative compaction.
+/// param all_rowsets, all rowsets in tablet.
+/// param current_cumulative_point, current cumulative point value.
+/// return score, the result score after calculate.
+virtual void calc_cumulative_compaction_score(
+const std::vector& all_rowsets, int64_t 
current_cumulative_point,
+uint32_t* score) = 0;
+
+/// This function implements the policy which represents how to pick the 
candicate rowsets for compaction. 
+/// This base class gives a unified implementation. Its derived classes 
also can override this function each other.
+/// param skip_window_sec, it means skipping the rowsets which use create 
time plus skip_window_sec is greater than now.
+/// param rs_version_map, mapping from version to rowset
+/// param cumulative_point,  current cumulative point of tablet
+/// return candidate_rowsets, the container of candidate rowsets 
+virtual void pick_candicate_rowsets(
+int64_t skip_window_sec,
+const std::unordered_map& 
rs_version_map,
+int64_t cumulative_point, std::vector* 
candidate_rowsets);
+
+/// Pick input rowsets from candidate rowsets for compaction. This 
function is pure virtual function. 
+/// Its implemention depands on concrete compaction policy.
+/// param candidate_rowsets, the candidate_rowsets vector container to 
pick input rowsets
+/// return input_rowsets, the vector container as return
+/// return last_delete_version, if has delete rowset, record the delete 
version from input_rowsets

[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471996935



##
File path: be/src/olap/olap_server.cpp
##
@@ -277,6 +281,27 @@ void* 
StorageEngine::_disk_stat_monitor_thread_callback(void* arg) {
 return nullptr;
 }
 
+void StorageEngine::_check_cumulative_compaction_config() {
+
+std::string cumulative_compaction_type = 
config::cumulative_compaction_policy;
+boost::to_upper(cumulative_compaction_type);
+
+// if size_based policy is used, check size_based policy configs
+if (cumulative_compaction_type == CUMULATIVE_SIZE_BASED_POLICY_TYPE) {
+int64_t size_based_promotion_size =
+config::cumulative_compaction_size_based_promotion_size_mbytes;
+int64_t size_based_promotion_min_size =
+
config::cumulative_compaction_size_based_promotion_min_size_mbytes;
+int64_t size_based_compaction_lower_bound_size =
+
config::cumulative_compaction_size_based_compaction_lower_bound_size_mbytes;
+
+// check size_based_promotion_size must be greater than 
size_based_promotion_min_size
+CHECK(size_based_promotion_size >= size_based_promotion_min_size);

Review comment:
   It is better to set to min size instead of usingf check here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated (8a3eaee -> 38a2a7a)

2020-08-18 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


from 8a3eaee  Update support batch delete storage design document  (#4234)
 add 38a2a7a  [Bug] Fix bug that modification of global variable can not be 
persisted. (#4324)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/doris/analysis/SetVar.java |  23 ++-
 .../java/org/apache/doris/catalog/Catalog.java |   5 +
 .../org/apache/doris/common/FeMetaVersion.java |   4 +-
 .../org/apache/doris/common/util/TimeUtils.java|   6 +-
 .../org/apache/doris/journal/JournalEntity.java|   6 +
 .../java/org/apache/doris/persist/EditLog.java |   9 ++
 .../apache/doris/persist/GlobalVarPersistInfo.java | 141 +++
 .../org/apache/doris/persist/OperationType.java|   4 +-
 .../java/org/apache/doris/qe/GlobalVariable.java   |  17 +++
 .../main/java/org/apache/doris/qe/VariableMgr.java | 154 +
 ...InfoTest.java => GlobalVarPersistInfoTest.java} |  35 +++--
 .../java/org/apache/doris/qe/VariableMgrTest.java  |  66 +
 12 files changed, 363 insertions(+), 107 deletions(-)
 create mode 100644 
fe/fe-core/src/main/java/org/apache/doris/persist/GlobalVarPersistInfo.java
 copy fe/fe-core/src/test/java/org/apache/doris/persist/{AlterViewInfoTest.java 
=> GlobalVarPersistInfoTest.java} (62%)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman closed issue #4323: [Bug] Modification of global variables is not correctly persisted.

2020-08-18 Thread GitBox


morningman closed issue #4323:
URL: https://github.com/apache/incubator-doris/issues/4323


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #4324: [Bug] Fix bug that modification of global variable can not be persisted.

2020-08-18 Thread GitBox


morningman merged pull request #4324:
URL: https://github.com/apache/incubator-doris/pull/4324


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman closed issue #4344: [Bug]BE crash when doing LOADING phase of spark load

2020-08-18 Thread GitBox


morningman closed issue #4344:
URL: https://github.com/apache/incubator-doris/issues/4344


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #4345: [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage

2020-08-18 Thread GitBox


morningman merged pull request #4345:
URL: https://github.com/apache/incubator-doris/pull/4345


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345)

2020-08-18 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new e251080  [Bug][MemTracker] Cleanup the mem tracker's constructor to 
avoid wrong usage (#4345)
e251080 is described below

commit e25108097d349be877789ad82cf2568da37a9007
Author: Mingyu Chen 
AuthorDate: Tue Aug 18 16:54:55 2020 +0800

[Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong 
usage (#4345)

After PR: #4135, If a mem tracker has parent, it should be created by 
'CreateTracker'.
So I removed other unused constructors.

And also fix the bug described in #4344
---
 be/src/exec/parquet_scanner.cpp |  1 -
 be/src/exec/parquet_scanner.h   |  1 -
 be/src/olap/push_handler.cpp|  2 +-
 be/src/olap/push_handler.h  |  2 +-
 be/src/runtime/mem_tracker.cpp  | 58 ++---
 be/src/runtime/mem_tracker.h| 42 ++---
 6 files changed, 40 insertions(+), 66 deletions(-)

diff --git a/be/src/exec/parquet_scanner.cpp b/be/src/exec/parquet_scanner.cpp
index d2e69e9..2db36f3 100644
--- a/be/src/exec/parquet_scanner.cpp
+++ b/be/src/exec/parquet_scanner.cpp
@@ -18,7 +18,6 @@
 #include "exec/parquet_scanner.h"
 #include "runtime/descriptors.h"
 #include "runtime/exec_env.h"
-#include "runtime/mem_tracker.h"
 #include "runtime/raw_value.h"
 #include "runtime/stream_load/load_stream_mgr.h"
 #include "runtime/stream_load/stream_load_pipe.h"
diff --git a/be/src/exec/parquet_scanner.h b/be/src/exec/parquet_scanner.h
index a052e65..09d92ff 100644
--- a/be/src/exec/parquet_scanner.h
+++ b/be/src/exec/parquet_scanner.h
@@ -42,7 +42,6 @@ class ExprContext;
 class TupleDescriptor;
 class TupleRow;
 class RowDescriptor;
-class MemTracker;
 class RuntimeProfile;
 class StreamLoadPipe;
 
diff --git a/be/src/olap/push_handler.cpp b/be/src/olap/push_handler.cpp
index fa5c6bd..a5e9b1c 100644
--- a/be/src/olap/push_handler.cpp
+++ b/be/src/olap/push_handler.cpp
@@ -946,7 +946,7 @@ OLAPStatus PushBrokerReader::init(const Schema* schema,
 }
 _runtime_profile = _runtime_state->runtime_profile();
 _runtime_profile->set_name("PushBrokerReader");
-_mem_tracker.reset(new MemTracker(_runtime_profile, -1, 
_runtime_profile->name(), _runtime_state->instance_mem_tracker()));
+_mem_tracker = MemTracker::CreateTracker(-1, "PushBrokerReader", 
_runtime_state->instance_mem_tracker());
 _mem_pool.reset(new MemPool(_mem_tracker.get()));
 _counter.reset(new ScannerCounter());
 
diff --git a/be/src/olap/push_handler.h b/be/src/olap/push_handler.h
index 181905d..3a3a319 100644
--- a/be/src/olap/push_handler.h
+++ b/be/src/olap/push_handler.h
@@ -248,7 +248,7 @@ private:
 const Schema* _schema;
 std::unique_ptr _runtime_state;
 RuntimeProfile* _runtime_profile;
-std::unique_ptr _mem_tracker;
+std::shared_ptr _mem_tracker;
 std::unique_ptr _mem_pool;
 std::unique_ptr _counter;
 std::unique_ptr _scanner;
diff --git a/be/src/runtime/mem_tracker.cpp b/be/src/runtime/mem_tracker.cpp
index 5e3c90b..f52befd 100644
--- a/be/src/runtime/mem_tracker.cpp
+++ b/be/src/runtime/mem_tracker.cpp
@@ -70,7 +70,7 @@ static std::shared_ptr root_tracker;
 static GoogleOnceType root_tracker_once = GOOGLE_ONCE_INIT;
 
 void MemTracker::CreateRootTracker() {
-  root_tracker.reset(new MemTracker(-1, "root", 
std::shared_ptr()));
+  root_tracker.reset(new MemTracker(-1, "root"));
   root_tracker->Init();
 }
 
@@ -85,7 +85,7 @@ std::shared_ptr MemTracker::CreateTracker(
   } else {
   real_parent = GetRootTracker();
   }
-  shared_ptr tracker(new MemTracker(byte_limit, label, 
real_parent, log_usage_if_zero));
+  shared_ptr tracker(new MemTracker(nullptr, byte_limit, label, 
real_parent, log_usage_if_zero));
   real_parent->AddChildTracker(tracker);
   tracker->Init();
 
@@ -102,56 +102,36 @@ std::shared_ptr MemTracker::CreateTracker(
   } else {
   real_parent = GetRootTracker();
   }
-  shared_ptr tracker(new MemTracker(profile, byte_limit, label, 
real_parent));
+  shared_ptr tracker(new MemTracker(profile, byte_limit, label, 
real_parent, true));
   real_parent->AddChildTracker(tracker);
   tracker->Init();
 
   return tracker;
 }
 
+MemTracker::MemTracker(int64_t byte_limit, const std::string& label) :
+MemTracker(nullptr, byte_limit, label, std::shared_ptr(), 
true) {
+}
+
 MemTracker::MemTracker(
+RuntimeProfile* profile,
 int64_t byte_limit, const string& label, const 
std::shared_ptr& parent, bool log_usage_if_zero)
   : limit_(byte_limit),
 soft_limit_(CalcSoftLimit(byte_limit)),
 label_(label),
 parent_(parent),
-
consumption_(std::make_shared(TUnit::BYTES)),
 consumption_metric_(nullptr),
 log_usage_if_zero_(log_usage_if_zero),
 num_gcs_metric_(nullptr),
 bytes_freed_by_last_gc_metric_(nullptr),

[GitHub] [incubator-doris] morningman merged pull request #4327: [Metrics] Support tablet level metrics

2020-08-18 Thread GitBox


morningman merged pull request #4327:
URL: https://github.com/apache/incubator-doris/pull/4327


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated (e251080 -> 56260a6)

2020-08-18 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


from e251080  [Bug][MemTracker] Cleanup the mem tracker's constructor to 
avoid wrong usage (#4345)
 add 56260a6  [Metrics] Support tablet level metrics (#4327)

No new revisions were added by this update.

Summary of changes:
 be/src/exec/olap_scanner.cpp   |  3 +++
 be/src/http/action/metrics_action.cpp  |  5 +++--
 be/src/http/action/stream_load.cpp |  2 +-
 be/src/olap/base_tablet.cpp| 16 +++-
 be/src/olap/base_tablet.h  |  7 +++
 be/src/olap/data_dir.cpp   |  2 +-
 be/src/olap/delta_writer.cpp   |  6 +-
 be/src/olap/memtable_flush_executor.h  |  2 +-
 be/src/olap/tablet.cpp |  6 ++
 be/src/olap/tablet.h   |  4 
 be/src/util/metrics.cpp| 14 ++
 be/src/util/metrics.h  | 18 +-
 be/test/util/new_metrics_test.cpp  |  4 ++--
 .../operation/monitor-metrics/be-metrics.md|  9 +++--
 .../operation/monitor-metrics/be-metrics.md|  9 +++--
 15 files changed, 85 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new issue #4384: [Bug][SparkLoad] Spark load will create rowset with incorrect rowset type

2020-08-18 Thread GitBox


morningman opened a new issue #4384:
URL: https://github.com/apache/incubator-doris/issues/4384


   **Describe the bug**
   1. create a table with segment v2 format
   2. load data with spark load
   3. the rowset with version [2-2] is with storage format SegmentV1, which is 
expected to be V2.
   
   **Why**
   
   the push handler only set the rowset writer's rowset type by the config of 
`default_rowset_type` of BE,
   not checking the `prefer_rowset_type` of the tablet.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new issue #4385: [Bug] tablet type in tablet meta is wrong

2020-08-18 Thread GitBox


morningman opened a new issue #4385:
URL: https://github.com/apache/incubator-doris/issues/4385


   **Describe the bug**
   The `tablet_type` in tablet meta should be `TABLET_TYPE_DISK`.
   But it is set to `TABLET_TYPE_MEMORY`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new issue #4386: [SegmentV2] Change the default storage format to SegmentV2

2020-08-18 Thread GitBox


morningman opened a new issue #4386:
URL: https://github.com/apache/incubator-doris/issues/4386


   **Is your feature request related to a problem? Please describe.**
   
   SInce the Segment V2 has been released for a long time, we should make it as 
default storage format for newly created table.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new pull request #4387: [SegmentV2] Change the default storage format to SegmentV2

2020-08-18 Thread GitBox


morningman opened a new pull request #4387:
URL: https://github.com/apache/incubator-doris/pull/4387


   ## Proposed changes
   
   SInce the Segment V2 has been released for a long time, we should make it as 
default storage format for newly created table.
   
   This CL mainly changes:
   1. For all newly created tables, their default storage format is Segment V2.
   2. For all already exist tablets, their storage format remain unchanged.
   3. Fix  bugs described in Fix #4384 and Fix #4385
   
   ## Types of changes
   
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [x] Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   - [x] Documentation Update (if none of the other choices apply)
   
   ## Checklist
   
   - [x] I have create an issue on (Fix #4386), and have described the 
bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [x] I have added tests that prove my fix is effective or that my feature 
works
   - [x] If this change need a document change, I have updated the document
   - [x] Any dependent changes have been merged
   
   ## Further comments
   
   We should provide a more friendly way to check the conversion progress of 
Segment V2
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] HangyuanLiu opened a new pull request #4388: Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure

2020-08-18 Thread GitBox


HangyuanLiu opened a new pull request #4388:
URL: https://github.com/apache/incubator-doris/pull/4388


   ## Proposed changes
   
   In the process of historical data transformation of materialized views, it 
may occur that the transformation fails due to data quality. 
   Add an error status code :` OLAP_ERR_DATE_QUALITY_ERR ` to determine if a 
data problem is causing the failure
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [] Bugfix (non-breaking change which fixes an issue)
   - [] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality 
to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [] I have create an issue on (Fix #ISSUE), and have described the 
bug/feature there in detail
   - [] Compiling and unit tests pass locally with my changes
   - [] I have added tests that prove my fix is effective or that my feature 
works
   - [] If this change need a document change, I have updated the document
   - [] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
d...@doris.apache.org by explaining why you chose the solution you did and what 
alternatives you considered, etc...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on pull request #4366: Optimise coding bit operation in BE

2020-08-18 Thread GitBox


imay commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675370532


   > > Not intended to interfere, just curious about how many improvements can 
be achieved from this PR, are there any benchmarks?
   > 
   > In my develop computure, when encode_varint64 execute 1 billion times. `v 
| B` version averagely uses 95095ms in 5 times.
   > `(v & (B - 1))` version averagely uses 96103ms in 5 times. It can improve 
aboat 0.5% ~ 1%. Encode_varint64 is used high frequency in many cases like 
bitmap_value, page_pointer encode and so on.
   
   It seems too slow to execute 5 bilion enocde operation in about 95s 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] HappenLee opened a new pull request #4389: [ODBC SCAN NODE] 2/4 Add Thrift Interface and Meta of ODBC_Scan_Node

2020-08-18 Thread GitBox


HappenLee opened a new pull request #4389:
URL: https://github.com/apache/incubator-doris/pull/4389


   issue:#4376
   
   ## Proposed changes
   
   Describe the big picture of your changes here to communicate to the 
maintainers why we should accept this pull request. If it fixes a bug or 
resolves a feature request, be sure to link to that issue.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality 
to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [x] I have create an issue on (Fix #ISSUE), and have described the 
bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [x] I have added tests that prove my fix is effective or that my feature 
works
   - [x] If this change need a document change, I have updated the document
   - [x] Any dependent changes have been merged
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #4253: Support more materialized view syntax

2020-08-18 Thread GitBox


EmmyMiao87 commented on pull request #4253:
URL: https://github.com/apache/incubator-doris/pull/4253#issuecomment-675379372


   `To compatibility with the old rollup logic, the syntax "DROP MATERIALIZED 
VIEW [ IF EXISTS ] [db_name].< mv_name > FROM [db].[table]" has been added`
   Both FROM and ON should be supported. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #4253: Support more materialized view syntax

2020-08-18 Thread GitBox


EmmyMiao87 commented on pull request #4253:
URL: https://github.com/apache/incubator-doris/pull/4253#issuecomment-675380536


   The syntax of `SHOW`, `ALTER`, `DROP` should be consistent



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] shuaijinchao opened a new issue #4390: [Document] increase mailing list subscription method.

2020-08-18 Thread GitBox


shuaijinchao opened a new issue #4390:
URL: https://github.com/apache/incubator-doris/issues/4390


   `Email` is an important communication method in the `Apache` project. I 
think the way to subscribe to the developer mailing list should be put in the 
README so that more people can see and participate in the project discussion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] shuaijinchao opened a new pull request #4391: doc: increase mailing list subscription method.

2020-08-18 Thread GitBox


shuaijinchao opened a new pull request #4391:
URL: https://github.com/apache/incubator-doris/pull/4391


   ## Proposed changes
   
   FIX #4390
   
   ## Types of changes
   
   - [x] Documentation Update (increase mailing list subscription method)
   
   ## Checklist
   
   - [x] I have create an issue on #4390 , and have described the bug/feature 
there in detail
   - [x] If this change need a document change, I have updated the document.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #4378: FIX: fix dynamic partition replicationNum error

2020-08-18 Thread GitBox


morningman commented on a change in pull request #4378:
URL: https://github.com/apache/incubator-doris/pull/4378#discussion_r472084694



##
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##
@@ -3973,43 +3974,55 @@ public static void getDdlStmt(Table table, List 
createTableStmt, List bfColumnNames = olapTable.getCopiedBfColumns();
 if (bfColumnNames != null) {
-
sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_BF_COLUMNS).append("\" = 
\"");
-sb.append(Joiner.on(", 
").join(olapTable.getCopiedBfColumns())).append("\"");
+appendProperties(sb, PropertyAnalyzer.PROPERTIES_BF_COLUMNS, 
Joiner.on(", ").join(olapTable.getCopiedBfColumns()));
 }
 
 if (separatePartition) {
 // version info
-
sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_VERSION_INFO).append("\" 
= \"");
-Partition partition = null;
+Partition partition;
 if (olapTable.getPartitionInfo().getType() == 
PartitionType.UNPARTITIONED) {
 partition = olapTable.getPartition(olapTable.getName());
 } else {
 Preconditions.checkState(partitionId.size() == 1);
 partition = olapTable.getPartition(partitionId.get(0));
 }
-sb.append(Joiner.on(",").join(partition.getVisibleVersion(), 
partition.getVisibleVersionHash()))
-.append("\"");
+appendProperties(sb, PropertyAnalyzer.PROPERTIES_VERSION_INFO, 
Joiner.on(",").join(partition.getVisibleVersion(), 
partition.getVisibleVersionHash()));
 }
 
 // colocateTable
 String colocateTable = olapTable.getColocateGroup();
 if (colocateTable != null) {
-
sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_COLOCATE_WITH).append("\" 
= \"");
-sb.append(colocateTable).append("\"");
+appendProperties(sb, 
PropertyAnalyzer.PROPERTIES_COLOCATE_WITH, colocateTable);
 }
 
 // dynamic partition
 if (olapTable.dynamicPartitionExists()) {
-
sb.append(olapTable.getTableProperty().getDynamicPartitionProperty().toString());
+DynamicPartitionProperty dynamicPartitionProperty = 
olapTable.getTableProperty().getDynamicPartitionProperty();
+appendProperties(sb, DynamicPartitionProperty.ENABLE, 
dynamicPartitionProperty.getEnable());

Review comment:
   This is not a good implments. Could you move these `appendProperties` 
into a method of `DynamicPartitionProperty`. So that if we add more properties 
in future, we only need to modify one place.
   You can pass the table's default replication number to that method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] ZhangYu0123 commented on pull request #4366: Optimise coding bit operation in BE

2020-08-18 Thread GitBox


ZhangYu0123 commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675405126


   > > > Not intended to interfere, just curious about how many improvements 
can be achieved from this PR, are there any benchmarks?
   > > 
   > > 
   > > In my develop computure, when encode_varint64 execute 1 billion times. 
`v | B` version averagely uses 95095ms in 5 times.
   > > `(v & (B - 1))` version averagely uses 96103ms in 5 times. It can 
improve aboat 0.5% ~ 1%. Encode_varint64 is used high frequency in many cases 
like bitmap_value, page_pointer encode and so on.
   > 
   > It seems too slow to execute 5 bilion enocde operation in about 95s
   
   5 billion costs 95s * 5. Compression is time consuming.   This  
encode_varint64 is mainly used to compress low-bit int to variable length 
instead of  int64_t type.  It is trade-off between time and space.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] stalary commented on a change in pull request #4378: FIX: fix dynamic partition replicationNum error

2020-08-18 Thread GitBox


stalary commented on a change in pull request #4378:
URL: https://github.com/apache/incubator-doris/pull/4378#discussion_r472088500



##
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##
@@ -3973,43 +3974,55 @@ public static void getDdlStmt(Table table, List 
createTableStmt, List bfColumnNames = olapTable.getCopiedBfColumns();
 if (bfColumnNames != null) {
-
sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_BF_COLUMNS).append("\" = 
\"");
-sb.append(Joiner.on(", 
").join(olapTable.getCopiedBfColumns())).append("\"");
+appendProperties(sb, PropertyAnalyzer.PROPERTIES_BF_COLUMNS, 
Joiner.on(", ").join(olapTable.getCopiedBfColumns()));
 }
 
 if (separatePartition) {
 // version info
-
sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_VERSION_INFO).append("\" 
= \"");
-Partition partition = null;
+Partition partition;
 if (olapTable.getPartitionInfo().getType() == 
PartitionType.UNPARTITIONED) {
 partition = olapTable.getPartition(olapTable.getName());
 } else {
 Preconditions.checkState(partitionId.size() == 1);
 partition = olapTable.getPartition(partitionId.get(0));
 }
-sb.append(Joiner.on(",").join(partition.getVisibleVersion(), 
partition.getVisibleVersionHash()))
-.append("\"");
+appendProperties(sb, PropertyAnalyzer.PROPERTIES_VERSION_INFO, 
Joiner.on(",").join(partition.getVisibleVersion(), 
partition.getVisibleVersionHash()));
 }
 
 // colocateTable
 String colocateTable = olapTable.getColocateGroup();
 if (colocateTable != null) {
-
sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_COLOCATE_WITH).append("\" 
= \"");
-sb.append(colocateTable).append("\"");
+appendProperties(sb, 
PropertyAnalyzer.PROPERTIES_COLOCATE_WITH, colocateTable);
 }
 
 // dynamic partition
 if (olapTable.dynamicPartitionExists()) {
-
sb.append(olapTable.getTableProperty().getDynamicPartitionProperty().toString());
+DynamicPartitionProperty dynamicPartitionProperty = 
olapTable.getTableProperty().getDynamicPartitionProperty();
+appendProperties(sb, DynamicPartitionProperty.ENABLE, 
dynamicPartitionProperty.getEnable());

Review comment:
   okay,I will modify it later





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #4383: [SparkLoad]Use the yarn command to get status and kill the application

2020-08-18 Thread GitBox


morningman commented on a change in pull request #4383:
URL: https://github.com/apache/incubator-doris/pull/4383#discussion_r472090745



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/load/loadv2/YarnApplicationReport.java
##
@@ -0,0 +1,121 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.load.loadv2;
+
+import org.apache.doris.common.LoadException;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Splitter;
+import com.google.common.collect.Maps;
+
+import org.apache.hadoop.yarn.api.records.ApplicationReport;
+import org.apache.hadoop.yarn.api.records.FinalApplicationStatus;
+import org.apache.hadoop.yarn.api.records.YarnApplicationState;
+import org.apache.hadoop.yarn.api.records.impl.pb.ApplicationReportPBImpl;
+import org.apache.hadoop.yarn.util.ConverterUtils;
+
+import java.text.NumberFormat;
+import java.text.ParseException;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Covert output string of command `yarn application -status` to application 
report.
+ * Input sample:
+ * ---
+ * Application Report :
+ * Application-Id : application_1573630236805_6763648
+ * Application-Name : doris_label_test
+ * Application-Type : SPARK-2.4.1
+ * User : test
+ * Queue : test-queue
+ * Start-Time : 1597654469958
+ * Finish-Time : 1597654801939
+ * Progress : 100%
+ * State : FINISHED
+ * Final-State : SUCCEEDED
+ * Tracking-URL : 
127.0.0.1:8004/history/application_1573630236805_6763648/1
+ * RPC Port : 40236
+ * AM Host : host-name
+ * --
+ *
+ * Output:
+ * ApplicationReport
+ */
+public class YarnApplicationReport {
+private static final String APPLICATION_ID = "Application-Id";
+private static final String APPLICATION_TYPE = "Application-Type";
+private static final String APPLICATION_NAME = "Application-Name";
+private static final String USER = "User";
+private static final String QUEUE = "Queue";
+private static final String START_TIME = "Start-Time";
+private static final String FINISH_TIME = "Finish-Time";
+private static final String PROGRESS = "Progress";
+private static final String STATE = "State";
+private static final String FINAL_STATE = "Final-State";
+private static final String TRACKING_URL = "Tracking-URL";
+private static final String RPC_PORT = "RPC Port";
+private static final String AM_HOST = "AM Host";
+private static final String DIAGNOSTICS = "Diagnostics";
+
+private ApplicationReport report;
+
+public YarnApplicationReport(String output) throws LoadException {
+this.report = new ApplicationReportPBImpl();
+parseFromOutput(output);
+}
+
+public ApplicationReport getReport() {
+return report;
+}
+
+private void parseFromOutput(String output) throws LoadException {
+Map reportMap = Maps.newHashMap();
+List lines = 
Splitter.onPattern("\n").trimResults().splitToList(output);
+// Application-Id : application_1573630236805_6763648 ==> 
(Application-Id, application_1573630236805_6763648)
+for (String line : lines) {
+List entry = 
Splitter.onPattern(":").limit(2).trimResults().splitToList(line);
+Preconditions.checkState(entry.size() <= 2);

Review comment:
   Preconditions.checkState(entry.size() <= 2, line);





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


yangzhg commented on a change in pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r472006650



##
File path: be/src/olap/cumulative_compaction_policy.h
##
@@ -0,0 +1,263 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H
+
+#include 
+
+#include "olap/utils.h"
+#include "olap/tablet.h"
+#include "olap/tablet_meta.h"
+#include "olap/rowset/rowset_meta.h"
+#include "olap/rowset/rowset.h"
+
+namespace doris {
+
+class Tablet;
+
+/// This CompactionPolicyType enum is used to represent the type of compaction 
policy.
+/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and 
CUMULATIVE_SIZE_BASED_POLICY.
+/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by 
num based policy.
+/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented 
by size_based policy.
+enum CompactionPolicyType {
+CUMULATIVE_NUM_BASED_POLICY = 0,
+CUMULATIVE_SIZE_BASED_POLICY = 1,
+};
+
+const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED";
+const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED";
+/// This class CumulativeCompactionPolicy is the base class of cumulative 
compaction policy.
+/// It defines the policy to do cumulative compaction. It has different 
derived classes, which implements 
+/// concrete cumulative compaction algorithm. The policy is configured by 
conf::cumulative_compaction_policy.
+/// The policy functions is the main steps to do cumulative compaction. For 
example, how to pick candicate 
+/// rowsets from tablet using current policy, how to calculate the cumulative 
point and how to calculate
+/// the tablet cumulative compcation score and so on.
+class CumulativeCompactionPolicy {
+
+public:
+/// Constructor function of CumulativeCompactionPolicy, 
+/// it needs tablet pointer to access tablet method. 
+/// param tablet, the shared pointer of tablet
+CumulativeCompactionPolicy(std::shared_ptr tablet) : 
_tablet(tablet){}
+
+/// Destructor function of CumulativeCompactionPolicy.
+virtual ~CumulativeCompactionPolicy() {}
+
+/// Calculate the cumulative compaction score of the tablet. This function 
uses rowsets meta and current 
+/// cumulative point to calculative the score of tablet. The score depends 
on the concrete algorithm of policy.
+/// In general, the score represents the segments nums to do cumulative 
compaction in total rowsets. The more
+/// score tablet gets, the earlier it can do  cumulative compaction.
+/// param all_rowsets, all rowsets in tablet.
+/// param current_cumulative_point, current cumulative point value.
+/// return score, the result score after calculate.
+virtual void calc_cumulative_compaction_score(
+const std::vector& all_rowsets, int64_t 
current_cumulative_point,
+uint32_t* score) = 0;
+
+/// This function implements the policy which represents how to pick the 
candicate rowsets for compaction. 
+/// This base class gives a unified implementation. Its derived classes 
also can override this function each other.
+/// param skip_window_sec, it means skipping the rowsets which use create 
time plus skip_window_sec is greater than now.
+/// param rs_version_map, mapping from version to rowset
+/// param cumulative_point,  current cumulative point of tablet
+/// return candidate_rowsets, the container of candidate rowsets 
+virtual void pick_candicate_rowsets(
+int64_t skip_window_sec,
+const std::unordered_map& 
rs_version_map,
+int64_t cumulative_point, std::vector* 
candidate_rowsets);
+
+/// Pick input rowsets from candidate rowsets for compaction. This 
function is pure virtual function. 
+/// Its implemention depands on concrete compaction policy.
+/// param candidate_rowsets, the candidate_rowsets vector container to 
pick input rowsets
+/// return input_rowsets, the vector container as return
+/// return last_delete_version, if has delete rowset, record the delete 
version from input_rowsets

[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #4253: Support more materialized view syntax

2020-08-18 Thread GitBox


EmmyMiao87 commented on a change in pull request #4253:
URL: https://github.com/apache/incubator-doris/pull/4253#discussion_r472123863



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/analysis/DropMaterializedViewStmt.java
##
@@ -38,38 +43,91 @@
  */
 public class DropMaterializedViewStmt extends DdlStmt {
 
-private String mvName;
-private TableName tableName;
 private boolean ifExists;
+private final TableName dbMvName;
+private final TableName dbTblName;
 
-public DropMaterializedViewStmt(boolean ifExists, String mvName, TableName 
tableName) {
-this.mvName = mvName;
-this.tableName = tableName;
+public DropMaterializedViewStmt(boolean ifExists, TableName dbMvName, 
TableName dbTblName) {
 this.ifExists = ifExists;
+this.dbMvName = dbMvName;
+this.dbTblName = dbTblName;
+}
+
+public boolean isSetIfExists() {
+return ifExists;
 }
 
 public String getMvName() {
-return mvName;
+return dbMvName.getTbl();
 }
 
-public TableName getTableName() {
-return tableName;
+public String getTblName() {
+if (dbTblName != null) {
+return dbTblName.getTbl();
+} else {
+return null;
+}
 }
 
-public boolean isIfExists() {
-return ifExists;
+public String getDbName() {
+if (dbTblName != null) {
+return dbTblName.getDb();
+} else {
+return dbMvName.getDb();
+}
 }
 
 @Override
 public void analyze(Analyzer analyzer) throws UserException {
-if (Strings.isNullOrEmpty(mvName)) {
-throw new AnalysisException("The materialized name could not be 
empty or null.");
+if (dbTblName != null && !Strings.isNullOrEmpty(dbMvName.getDb())) {
+throw new AnalysisException("Syntax drop materialized view 
[mv-name] from db.name mush specify database name explicitly in `from`");

Review comment:
   If the dbTableName is different from dbMvName.getDb(), the Doris will 
thrown Exception.

##
File path: 
fe/fe-core/src/main/java/org/apache/doris/analysis/DropMaterializedViewStmt.java
##
@@ -38,38 +43,91 @@
  */
 public class DropMaterializedViewStmt extends DdlStmt {
 
-private String mvName;
-private TableName tableName;
 private boolean ifExists;
+private final TableName dbMvName;
+private final TableName dbTblName;
 
-public DropMaterializedViewStmt(boolean ifExists, String mvName, TableName 
tableName) {
-this.mvName = mvName;
-this.tableName = tableName;
+public DropMaterializedViewStmt(boolean ifExists, TableName dbMvName, 
TableName dbTblName) {
 this.ifExists = ifExists;
+this.dbMvName = dbMvName;
+this.dbTblName = dbTblName;
+}
+
+public boolean isSetIfExists() {
+return ifExists;
 }
 
 public String getMvName() {
-return mvName;
+return dbMvName.getTbl();
 }
 
-public TableName getTableName() {
-return tableName;
+public String getTblName() {
+if (dbTblName != null) {
+return dbTblName.getTbl();
+} else {
+return null;
+}
 }
 
-public boolean isIfExists() {
-return ifExists;
+public String getDbName() {
+if (dbTblName != null) {
+return dbTblName.getDb();
+} else {
+return dbMvName.getDb();
+}
 }
 
 @Override
 public void analyze(Analyzer analyzer) throws UserException {
-if (Strings.isNullOrEmpty(mvName)) {
-throw new AnalysisException("The materialized name could not be 
empty or null.");
+if (dbTblName != null && !Strings.isNullOrEmpty(dbMvName.getDb())) {
+throw new AnalysisException("Syntax drop materialized view 
[mv-name] from db.name mush specify database name explicitly in `from`");
+}
+if (dbTblName != null) {
+if (!Strings.isNullOrEmpty(dbMvName.getDb())) {
+throw new AnalysisException("If the database appears after the 
from statement, " +

Review comment:
   What's the difference between here and above?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new pull request #4392: [Bug] Remove RECOVER_TABLET worker pool to make ASAN compile happy

2020-08-18 Thread GitBox


morningman opened a new pull request #4392:
URL: https://github.com/apache/incubator-doris/pull/4392


   In PR #4255, I missed to remove some code



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581

2020-08-18 Thread GitBox


marising commented on a change in pull request #4330:
URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472246168



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionCache.java
##
@@ -0,0 +1,215 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.qe.cache;
+
+import com.google.common.collect.Lists;
+import org.apache.doris.analysis.CompoundPredicate;
+import org.apache.doris.analysis.Expr;
+import org.apache.doris.analysis.InlineViewRef;
+import org.apache.doris.analysis.QueryStmt;
+import org.apache.doris.analysis.SelectStmt;
+import org.apache.doris.analysis.TableRef;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.OlapTable;
+import org.apache.doris.catalog.RangePartitionInfo;
+import org.apache.doris.common.Status;
+import org.apache.doris.common.util.DebugUtil;
+import org.apache.doris.metric.MetricRepo;
+import org.apache.doris.qe.RowBatch;
+import org.apache.doris.thrift.TUniqueId;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.util.List;
+
+public class PartitionCache extends Cache {
+private static final Logger LOG = 
LogManager.getLogger(PartitionCache.class);
+private SelectStmt nokeyStmt;

Review comment:
   After rewriting, there is no partition key select statement 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581

2020-08-18 Thread GitBox


marising commented on a change in pull request #4330:
URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472259742



##
File path: fe/fe-core/src/main/java/org/apache/doris/qe/cache/CacheAnalyzer.java
##
@@ -0,0 +1,450 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.qe.cache;
+
+import org.apache.doris.analysis.AggregateInfo;
+import org.apache.doris.analysis.BinaryPredicate;
+import org.apache.doris.analysis.CastExpr;
+import org.apache.doris.analysis.CompoundPredicate;
+import org.apache.doris.analysis.Expr;
+import org.apache.doris.analysis.InlineViewRef;
+import org.apache.doris.analysis.QueryStmt;
+import org.apache.doris.analysis.SelectStmt;
+import org.apache.doris.analysis.SlotRef;
+import org.apache.doris.analysis.StatementBase;
+import org.apache.doris.analysis.TableRef;
+import org.apache.doris.catalog.OlapTable;
+import org.apache.doris.catalog.RangePartitionInfo;
+import org.apache.doris.catalog.PartitionType;
+import org.apache.doris.catalog.Partition;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.common.util.DebugUtil;
+import org.apache.doris.metric.MetricRepo;
+import org.apache.doris.planner.OlapScanNode;
+import org.apache.doris.planner.Planner;
+import org.apache.doris.planner.ScanNode;
+import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.qe.RowBatch;
+import org.apache.doris.common.Config;
+import org.apache.doris.common.Status;
+
+import com.google.common.collect.Lists;
+import org.apache.doris.thrift.TUniqueId;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Analyze which caching mode a SQL is suitable for
+ * 1. T + 1 update is suitable for SQL mode
+ * 2. Partition by date, update the data of the day in near real time, which 
is suitable for Partition mode
+ */
+public class CacheAnalyzer {
+private static final Logger LOG = 
LogManager.getLogger(CacheAnalyzer.class);
+
+/**
+ * NoNeed : disable config or variable, not query, not scan table etc.
+ */
+public enum CacheMode {
+NoNeed,
+None,
+TTL,
+Sql,
+Partition
+}
+
+private ConnectContext context;
+private boolean enableSqlCache = false;
+private boolean enablePartitionCache = false;
+private TUniqueId queryId;
+private CacheMode cacheMode;
+private CacheTable latestTable;
+private StatementBase parsedStmt;
+private SelectStmt selectStmt;
+private List scanNodes;
+private OlapTable olapTable;
+private RangePartitionInfo partitionInfo;
+private Column partColumn;
+private CompoundPredicate partitionPredicate;
+private Cache cache;
+
+public Cache getCache() {
+return cache;
+}
+
+public CacheAnalyzer(ConnectContext context, StatementBase parsedStmt, 
Planner planner) {
+this.context = context;
+this.queryId = context.queryId();
+this.parsedStmt = parsedStmt;
+scanNodes = planner.getScanNodes();
+latestTable = new CacheTable();
+checkCacheConfig();
+}
+
+//for unit test
+public CacheAnalyzer(ConnectContext context, StatementBase parsedStmt, 
List scanNodes) {
+this.context = context;
+this.parsedStmt = parsedStmt;
+this.scanNodes = scanNodes;
+checkCacheConfig();
+}
+
+private void checkCacheConfig() {
+if (Config.cache_enable_sql_mode) {
+if (context.getSessionVariable().isEnableSqlCache()) {

Review comment:
   I understand that getsessionvariable() can obtain session variables and 
global variables. Session variables have higher priority than global variables. 
I don't know if I understand correctly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---

[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581

2020-08-18 Thread GitBox


marising commented on a change in pull request #4330:
URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472266869



##
File path: fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java
##
@@ -575,6 +583,78 @@ private void handleSetStmt() {
 context.getState().setOk();
 }
 
+private void sendChannel(MysqlChannel channel, List 
cacheValues, boolean hitAll)

Review comment:
   This means whether the query partitions are all hit,so isHitAll is 
better?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581

2020-08-18 Thread GitBox


marising commented on a change in pull request #4330:
URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472277065



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionRange.java
##
@@ -0,0 +1,596 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.qe.cache;
+
+import org.apache.doris.analysis.CompoundPredicate;
+import org.apache.doris.analysis.BinaryPredicate;
+import org.apache.doris.analysis.DateLiteral;
+import org.apache.doris.analysis.InPredicate;
+import org.apache.doris.analysis.PartitionValue;
+import org.apache.doris.analysis.Expr;
+import org.apache.doris.analysis.LiteralExpr;
+import org.apache.doris.analysis.IntLiteral;
+import org.apache.doris.catalog.OlapTable;
+import org.apache.doris.catalog.PrimitiveType;
+import org.apache.doris.catalog.RangePartitionInfo;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.Partition;
+import org.apache.doris.catalog.PartitionKey;
+import org.apache.doris.catalog.Type;
+import org.apache.doris.common.Config;
+import org.apache.doris.planner.PartitionColumnFilter;
+
+import org.apache.doris.common.AnalysisException;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Range;
+
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Convert the range of the partition to the list
+ * all partition by day/week/month split to day list
+ */
+public class PartitionRange {
+private static final Logger LOG = 
LogManager.getLogger(PartitionRange.class);
+
+public class PartitionSingle {
+private Partition partition;
+private PartitionKey partitionKey;
+private long partitionId;
+private PartitionKeyType cacheKey;
+private boolean fromCache;
+private boolean tooNew;
+
+public Partition getPartition() {
+return partition;
+}
+
+public void setPartition(Partition partition) {
+this.partition = partition;
+}
+
+public PartitionKey getPartitionKey() {
+return partitionKey;
+}
+
+public void setPartitionKey(PartitionKey key) {
+this.partitionKey = key;
+}
+
+public long getPartitionId() {
+return partitionId;
+}
+
+public void setPartitionId(long partitionId) {
+this.partitionId = partitionId;
+}
+
+public PartitionKeyType getCacheKey() {
+return cacheKey;
+}
+
+public void setCacheKey(PartitionKeyType cacheKey) {
+this.cacheKey.clone(cacheKey);
+}
+
+public boolean isFromCache() {
+return fromCache;
+}
+
+public void setFromCache(boolean fromCache) {
+this.fromCache = fromCache;
+}
+
+public boolean isTooNew() {
+return tooNew;
+}
+
+public void setTooNew(boolean tooNew) {
+this.tooNew = tooNew;
+}
+
+public PartitionSingle() {
+this.partitionId = 0;
+this.cacheKey = new PartitionKeyType();
+this.fromCache = false;
+this.tooNew = false;
+}
+
+public void Debug() {
+if (partition != null) {
+LOG.info("partition id {}, cacheKey {}, version {}, time {}, 
fromCache {}, tooNew {} ",
+partitionId, cacheKey.realValue(),
+partition.getVisibleVersion(), 
partition.getVisibleVersionTime(),
+fromCache, tooNew);
+} else {
+LOG.info("partition id {}, cacheKey {}, fromCache {}, tooNew 
{} ", partitionId,
+cacheKey.realValue(), fromCache, tooNew);
+}
+}
+}
+
+public enum KeyType {
+DEFAULT,
+LONG,
+DATE,
+DATETIME,
+TIME
+}
+
+public static class PartitionKeyType {
+private SimpleDateFormat df8 = new SimpleDateFormat("MMdd");
+private SimpleDateFormat 

[GitHub] [incubator-doris] morningman merged pull request #4212: Compaction rules optimization

2020-08-18 Thread GitBox


morningman merged pull request #4212:
URL: https://github.com/apache/incubator-doris/pull/4212


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: [Compaction]Compaction rules optimization (#4212)

2020-08-18 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new dc3ed1c  [Compaction]Compaction rules optimization (#4212)
dc3ed1c is described below

commit dc3ed1c525e08f9bd8acfb01b3507c6c7d230164
Author: ZhangYu0123 <67053339+zhangyu0...@users.noreply.github.com>
AuthorDate: Wed Aug 19 09:34:13 2020 +0800

[Compaction]Compaction rules optimization (#4212)

Compaction rules optimization, the detail problem description and design to 
see #4164.
This pr commits 2 functions:
(1) add the cumulative policy configable, and implement original policy.
(2) implement universal policy, the optimization version in #4164.
---
 be/src/common/config.h |   21 +
 be/src/olap/CMakeLists.txt |1 +
 be/src/olap/cumulative_compaction.cpp  |   63 +-
 be/src/olap/cumulative_compaction.h|4 +-
 be/src/olap/cumulative_compaction_policy.cpp   |  468 +
 be/src/olap/cumulative_compaction_policy.h |  263 +
 be/src/olap/olap_server.cpp|   34 +-
 be/src/olap/rowset/rowset_meta.h   |4 +
 be/src/olap/storage_engine.h   |2 +
 be/src/olap/tablet.cpp |  116 +--
 be/src/olap/tablet.h   |   23 +-
 be/src/olap/version_graph.cpp  |3 +-
 be/test/olap/cumulative_compaction_policy_test.cpp | 1022 
 docs/en/administrator-guide/config/be_config.md|   42 +-
 docs/zh-CN/administrator-guide/config/be_config.md |   40 +
 15 files changed, 1976 insertions(+), 130 deletions(-)

diff --git a/be/src/common/config.h b/be/src/common/config.h
index 145d9b3..08151ad 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -268,6 +268,27 @@ namespace config {
 CONF_mInt64(base_compaction_interval_seconds_since_last_operation, 
"86400");
 CONF_mInt32(base_compaction_write_mbytes_per_sec, "5");
 
+// config the cumulative compaction policy
+// Valid configs: num_base, size_based
+// num_based policy, the original version of cumulative compaction, 
cumulative version compaction once.
+// size_based policy, a optimization version of cumulative compaction, 
targeting the use cases requiring 
+// lower write amplification, trading off read amplification and space 
amplification.
+CONF_String(cumulative_compaction_policy, "num_based");
+
+// In size_based policy, output rowset of cumulative compaction total disk 
size exceed this config size, 
+// this rowset will be given to base compaction, unit is m byte.
+CONF_mInt64(cumulative_size_based_promotion_size_mbytes, "1024");
+// In size_based policy, output rowset of cumulative compaction total disk 
size exceed this config ratio of
+// base rowset's total disk size, this rowset will be given to base 
compaction. The value must be between 
+// 0 and 1.
+CONF_mDouble(cumulative_size_based_promotion_ratio, "0.05");
+// In size_based policy, the smallest size of rowset promotion. When the 
rowset is less than this config, this 
+// rowset will be not given to base compaction. The unit is m byte.
+CONF_mInt64(cumulative_size_based_promotion_min_size_mbytes, "64");
+// The lower bound size to do cumulative compaction. When total disk size 
of candidate rowsets is less than 
+// this size, size_based policy also does cumulative compaction. The unit 
is m byte.
+CONF_mInt64(cumulative_size_based_compaction_lower_size_mbytes, "64");
+
 // cumulative compaction policy: max delta file's size unit:B
 CONF_mInt32(cumulative_compaction_check_interval_seconds, "10");
 CONF_mInt64(min_cumulative_compaction_num_singleton_deltas, "5");
diff --git a/be/src/olap/CMakeLists.txt b/be/src/olap/CMakeLists.txt
index 884c045..13c11a0 100644
--- a/be/src/olap/CMakeLists.txt
+++ b/be/src/olap/CMakeLists.txt
@@ -37,6 +37,7 @@ add_library(Olap STATIC
 comparison_predicate.cpp
 compress.cpp
 cumulative_compaction.cpp
+cumulative_compaction_policy.cpp
 delete_handler.cpp
 delta_writer.cpp
 file_helper.cpp
diff --git a/be/src/olap/cumulative_compaction.cpp 
b/be/src/olap/cumulative_compaction.cpp
index a5f1358..c6bf9f8 100755
--- a/be/src/olap/cumulative_compaction.cpp
+++ b/be/src/olap/cumulative_compaction.cpp
@@ -27,7 +27,7 @@ CumulativeCompaction::CumulativeCompaction(TabletSharedPtr 
tablet, const std::st
 : Compaction(tablet, label, parent_tracker),
   
_cumulative_rowset_size_threshold(config::cumulative_compaction_budgeted_bytes) 
{}
 
-CumulativeCompaction::~CumulativeCompaction() { }
+CumulativeCompaction::~CumulativeCompaction() {}
 
 OLAPStatus CumulativeCompaction::compact() {
 if (!_table

[GitHub] [incubator-doris] HangyuanLiu commented on a change in pull request #4253: Support more materialized view syntax

2020-08-18 Thread GitBox


HangyuanLiu commented on a change in pull request #4253:
URL: https://github.com/apache/incubator-doris/pull/4253#discussion_r472590847



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/analysis/DropMaterializedViewStmt.java
##
@@ -38,38 +43,91 @@
  */
 public class DropMaterializedViewStmt extends DdlStmt {
 
-private String mvName;
-private TableName tableName;
 private boolean ifExists;
+private final TableName dbMvName;
+private final TableName dbTblName;
 
-public DropMaterializedViewStmt(boolean ifExists, String mvName, TableName 
tableName) {
-this.mvName = mvName;
-this.tableName = tableName;
+public DropMaterializedViewStmt(boolean ifExists, TableName dbMvName, 
TableName dbTblName) {
 this.ifExists = ifExists;
+this.dbMvName = dbMvName;
+this.dbTblName = dbTblName;
+}
+
+public boolean isSetIfExists() {
+return ifExists;
 }
 
 public String getMvName() {
-return mvName;
+return dbMvName.getTbl();
 }
 
-public TableName getTableName() {
-return tableName;
+public String getTblName() {
+if (dbTblName != null) {
+return dbTblName.getTbl();
+} else {
+return null;
+}
 }
 
-public boolean isIfExists() {
-return ifExists;
+public String getDbName() {
+if (dbTblName != null) {
+return dbTblName.getDb();
+} else {
+return dbMvName.getDb();
+}
 }
 
 @Override
 public void analyze(Analyzer analyzer) throws UserException {
-if (Strings.isNullOrEmpty(mvName)) {
-throw new AnalysisException("The materialized name could not be 
empty or null.");
+if (dbTblName != null && !Strings.isNullOrEmpty(dbMvName.getDb())) {
+throw new AnalysisException("Syntax drop materialized view 
[mv-name] from db.name mush specify database name explicitly in `from`");

Review comment:
   If db set in from , it mush not be set in view name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: Support udaf_orthogonal_bitmap (#4198)

2020-08-18 Thread lingmiao
This is an automated email from the ASF dual-hosted git repository.

lingmiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new f924282  Support udaf_orthogonal_bitmap (#4198)
f924282 is described below

commit f92428248f91a191294bc2a8a2deb1c209250acc
Author: zhbinbin <16679214+zhbin...@users.noreply.github.com>
AuthorDate: Wed Aug 19 10:29:13 2020 +0800

Support udaf_orthogonal_bitmap (#4198)

The original Doris bitmap aggregation function has poor performance on the 
intersection and union set of bitmap cardinality of more than one billion. 
There are two reasons for this. The first is that when the bitmap cardinality 
is large, if the data size exceeds 1g, the network / disk IO time consumption 
will increase; The second point is that all the sink data of the back-end be 
instance are transferred to the top node for intersection and union 
calculation, which leads to the pressu [...]

My solution is to create a fixed schema table based on the Doris 
fragmentation rule, and hash fragment the ID range based on the bitmap, that 
is, cut the ID range vertically to form a small cube. Such bitmap blocks will 
become smaller and evenly distributed on all back-end be instances. Based on 
the schema table, some new high-performance udaf aggregation functions are 
developed. All Scan nodes participate in intersection and union calculation, 
and top nodes only summarize

The design goal is that the base number of bitmap is more than 10 billion, 
and the response time of cross union set calculation of 100 dimensional 
granularity is within 5 s.

There are three udaf functions in this commit: 
orthogonal_bitmap_intersect_count, orthogonal_bitmap_union_count, 
orthogonal_bitmap_intersect.
---
 contrib/udf/CMakeLists.txt |1 +
 .../udf/src/udaf_orthogonal_bitmap/CMakeLists.txt  |   92 ++
 .../udf/src/udaf_orthogonal_bitmap/bitmap_value.h  | 1326 
 .../orthogonal_bitmap_function.cpp |  492 
 .../orthogonal_bitmap_function.h   |   62 +
 .../udf/src/udaf_orthogonal_bitmap/string_value.h  |  175 +++
 docs/.vuepress/sidebar/en.js   |4 +-
 docs/.vuepress/sidebar/zh-CN.js|4 +-
 .../udf/contrib/udaf-orthogonal-bitmap-manual.md   |  249 
 .../udf/contrib/udaf-orthogonal-bitmap-manual.md   |  238 
 10 files changed, 2641 insertions(+), 2 deletions(-)

diff --git a/contrib/udf/CMakeLists.txt b/contrib/udf/CMakeLists.txt
index e0feef1..8554516 100644
--- a/contrib/udf/CMakeLists.txt
+++ b/contrib/udf/CMakeLists.txt
@@ -72,5 +72,6 @@ set_target_properties(udf PROPERTIES IMPORTED_LOCATION 
$ENV{DORIS_HOME}/output/u
 
 # Add the subdirector of new UDF in here
 add_subdirectory(${SRC_DIR}/udf_samples)
+add_subdirectory(${SRC_DIR}/udaf_orthogonal_bitmap)
 
 install(DIRECTORY DESTINATION ${OUTPUT_DIR})
diff --git a/contrib/udf/src/udaf_orthogonal_bitmap/CMakeLists.txt 
b/contrib/udf/src/udaf_orthogonal_bitmap/CMakeLists.txt
new file mode 100644
index 000..5741509
--- /dev/null
+++ b/contrib/udf/src/udaf_orthogonal_bitmap/CMakeLists.txt
@@ -0,0 +1,92 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# where to put generated libraries
+set(LIBRARY_OUTPUT_PATH "${BUILD_DIR}/src/udaf_orthogonal_bitmap")
+
+# where to put generated binaries
+set(EXECUTABLE_OUTPUT_PATH "${BUILD_DIR}/src/udaf_orthogonal_bitmap")
+
+
+# set CMAKE_BUILD_TARGET_ARCH  
  
+# use `lscpu | grep 'Architecture' | awk '{print $2}'` only support system 
which language is en_US.UTF-8
+execute_process(COMMAND bash "-c" "uname -m"
+OUTPUT_VARIABLE
+CMAKE_BUILD_TARGET_ARCH
+OUTPUT_STRIP_TRAILING_WHITESPACE)
+message(STATUS "Build target arch is ${CMAKE_BUILD_TARGET_ARCH}")
+
+# Set dirs
+set(SRC_DIR "$ENV{DORIS_HOME}/be/src/")
+set(THIRDPARTY_DIR "$ENV{DORIS_THIRDPARTY}/installed/")
+
+# Set include dirs
+include_directories(./)
+include_directories(${THIRDPARTY_DIR}/include/)
+
+# message
+message(STATUS "base dir is ${B

[GitHub] [incubator-doris] EmmyMiao87 merged pull request #4198: Add bitmap longitudinal cutting udaf

2020-08-18 Thread GitBox


EmmyMiao87 merged pull request #4198:
URL: https://github.com/apache/incubator-doris/pull/4198


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] marising commented on issue #4370: Release Nodes 0.13.0

2020-08-18 Thread GitBox


marising commented on issue #4370:
URL: 
https://github.com/apache/incubator-doris/issues/4370#issuecomment-675815496


   Please merge the feature:
   
   [Feature][Cache] Doris caches query results based on partition #2581
   
   
   
   
   
   
   
   LiHaibo 2020-8-19
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   At 2020-08-17 19:52:47, "EmmyMiao87"  wrote:
   
   Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub, or unsubscribe.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581

2020-08-18 Thread GitBox


marising commented on a change in pull request #4330:
URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472609245



##
File path: fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java
##
@@ -575,6 +583,78 @@ private void handleSetStmt() {
 context.getState().setOk();
 }
 
+private void sendChannel(MysqlChannel channel, List 
cacheValues, boolean hitAll)
+throws Exception {
+RowBatch batch = null;
+for (CacheBeProxy.CacheValue value : cacheValues) {
+batch = value.getRowBatch();
+for (ByteBuffer row : batch.getBatch().getRows()) {
+channel.sendOnePacket(row);
+}
+context.updateReturnRows(batch.getBatch().getRows().size());
+}
+if (hitAll) {
+if (batch != null) {
+statisticsForAuditLog = batch.getQueryStatistics();
+}
+context.getState().setEof();
+return;
+}
+}
+
+private boolean handleCacheStmt(CacheAnalyzer cacheAnalyzer,MysqlChannel 
channel) throws Exception {
+RowBatch batch = null;
+CacheBeProxy.FetchCacheResult cacheResult = 
cacheAnalyzer.getCacheData();
+CacheMode mode = cacheAnalyzer.getCacheMode();
+if (cacheResult != null) {
+isCached = true;
+if (cacheAnalyzer.getHitRange() == Cache.HitRange.Full) {
+sendChannel(channel, cacheResult.getValueList(), true);
+return true;
+}
+//rewrite sql
+if (mode == CacheMode.Partition) {
+if (cacheAnalyzer.getHitRange() == Cache.HitRange.Left) {
+sendChannel(channel, cacheResult.getValueList(), false);
+}
+SelectStmt newSelectStmt = cacheAnalyzer.getRewriteStmt();
+newSelectStmt.reset();
+analyzer = new Analyzer(context.getCatalog(), context);
+newSelectStmt.analyze(analyzer);
+planner = new Planner();
+planner.plan(newSelectStmt, analyzer, 
context.getSessionVariable().toThrift());
+}
+}
+
+coord = new Coordinator(context, analyzer, planner);
+QeProcessorImpl.INSTANCE.registerQuery(context.queryId(),
+new QeProcessorImpl.QueryInfo(context, originStmt.originStmt, 
coord));
+coord.exec();
+
+while (true) {
+batch = coord.getNext();
+if (batch.getBatch() != null) {
+cacheAnalyzer.copyRowBatch(batch);
+for (ByteBuffer row : batch.getBatch().getRows()) {
+channel.sendOnePacket(row);
+}
+context.updateReturnRows(batch.getBatch().getRows().size());
+}
+if (batch.isEos()) {
+break;
+}
+}
+
+if (cacheResult != null && cacheAnalyzer.getHitRange() == 
Cache.HitRange.Right) {
+sendChannel(channel, cacheResult.getValueList(), false);
+}
+
+cacheAnalyzer.updateCache();

Review comment:
   The updateCache method determines whether the background Cache needs to 
be updated
   ```
   public void updateCache() {
   if (cacheMode == CacheMode.None || cacheMode == CacheMode.NoNeed) {
   return;
   }
   cache.updateCache();
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581

2020-08-18 Thread GitBox


marising commented on a change in pull request #4330:
URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472611199



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionRange.java
##
@@ -0,0 +1,596 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.qe.cache;
+
+import org.apache.doris.analysis.CompoundPredicate;
+import org.apache.doris.analysis.BinaryPredicate;
+import org.apache.doris.analysis.DateLiteral;
+import org.apache.doris.analysis.InPredicate;
+import org.apache.doris.analysis.PartitionValue;
+import org.apache.doris.analysis.Expr;
+import org.apache.doris.analysis.LiteralExpr;
+import org.apache.doris.analysis.IntLiteral;
+import org.apache.doris.catalog.OlapTable;
+import org.apache.doris.catalog.PrimitiveType;
+import org.apache.doris.catalog.RangePartitionInfo;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.Partition;
+import org.apache.doris.catalog.PartitionKey;
+import org.apache.doris.catalog.Type;
+import org.apache.doris.common.Config;
+import org.apache.doris.planner.PartitionColumnFilter;
+
+import org.apache.doris.common.AnalysisException;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Range;
+
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Convert the range of the partition to the list
+ * all partition by day/week/month split to day list
+ */
+public class PartitionRange {
+private static final Logger LOG = 
LogManager.getLogger(PartitionRange.class);
+
+public class PartitionSingle {
+private Partition partition;
+private PartitionKey partitionKey;
+private long partitionId;
+private PartitionKeyType cacheKey;
+private boolean fromCache;
+private boolean tooNew;
+
+public Partition getPartition() {
+return partition;
+}
+
+public void setPartition(Partition partition) {
+this.partition = partition;
+}
+
+public PartitionKey getPartitionKey() {
+return partitionKey;
+}
+
+public void setPartitionKey(PartitionKey key) {
+this.partitionKey = key;
+}
+
+public long getPartitionId() {
+return partitionId;
+}
+
+public void setPartitionId(long partitionId) {
+this.partitionId = partitionId;
+}
+
+public PartitionKeyType getCacheKey() {
+return cacheKey;
+}
+
+public void setCacheKey(PartitionKeyType cacheKey) {
+this.cacheKey.clone(cacheKey);
+}
+
+public boolean isFromCache() {
+return fromCache;
+}
+
+public void setFromCache(boolean fromCache) {
+this.fromCache = fromCache;
+}
+
+public boolean isTooNew() {
+return tooNew;
+}
+
+public void setTooNew(boolean tooNew) {
+this.tooNew = tooNew;
+}
+
+public PartitionSingle() {
+this.partitionId = 0;
+this.cacheKey = new PartitionKeyType();
+this.fromCache = false;
+this.tooNew = false;
+}
+
+public void Debug() {
+if (partition != null) {
+LOG.info("partition id {}, cacheKey {}, version {}, time {}, 
fromCache {}, tooNew {} ",
+partitionId, cacheKey.realValue(),
+partition.getVisibleVersion(), 
partition.getVisibleVersionTime(),
+fromCache, tooNew);
+} else {
+LOG.info("partition id {}, cacheKey {}, fromCache {}, tooNew 
{} ", partitionId,
+cacheKey.realValue(), fromCache, tooNew);
+}
+}
+}
+
+public enum KeyType {
+DEFAULT,
+LONG,
+DATE,
+DATETIME,
+TIME
+}
+
+public static class PartitionKeyType {
+private SimpleDateFormat df8 = new SimpleDateFormat("MMdd");
+private SimpleDateFormat 

[GitHub] [incubator-doris] stalary closed pull request #4378: FIX: fix dynamic partition replicationNum error

2020-08-18 Thread GitBox


stalary closed pull request #4378:
URL: https://github.com/apache/incubator-doris/pull/4378


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] stalary commented on pull request #4393: FIX: fix dynamic partition replicationNum error

2020-08-18 Thread GitBox


stalary commented on pull request #4393:
URL: https://github.com/apache/incubator-doris/pull/4393#issuecomment-675820778


   The previous branch did not work properly, so I recreated PR. @morningman 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] stalary opened a new pull request #4393: FIX: fix dynamic partition replicationNum error

2020-08-18 Thread GitBox


stalary opened a new pull request #4393:
URL: https://github.com/apache/incubator-doris/pull/4393


   ## Proposed changes
   dynamic_partition.replication_num default is replication_num, but show 
create table show -1
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality 
to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Further comments
   
   replace DynamicPartitionProperty toString with getProperties
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #4375: Fix errors when alter materialized view which based on dup table

2020-08-18 Thread GitBox


EmmyMiao87 commented on a change in pull request #4375:
URL: https://github.com/apache/incubator-doris/pull/4375#discussion_r472626637



##
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java
##
@@ -556,6 +556,10 @@ private void addColumnInternal(OlapTable olapTable, Column 
newColumn, ColumnPosi
 throw new DdlException("Can not assign aggregation method on 
column in Duplicate data model table: " + newColName);
 }
 if (!newColumn.isKey()) {

Review comment:
   I didn't understand what you mean...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] HappenLee opened a new issue #4394: [Proposal] Support Bucket Shuffle Join for Doris

2020-08-18 Thread GitBox


HappenLee opened a new issue #4394:
URL: https://github.com/apache/incubator-doris/issues/4394


   ## Motivation
   At present,  Doris support 3 type join: **shuffle join**, **broadcast 
join**, **colocate join**. 
   Except colocate join,another join will lead to a lot of network consumption. 
   
   For example,  there a SQL A join B, the cost of network.
   * **broadcast join**: if table A is divided into three parts,the net work 
cost is ``` 3B```
   * **shuffle join**: the network cost is ```A + B```.
   
   These network consumption not only leads to slow query, but also leads to 
extra memory consumption during join.  
   
   Each Doris table have disrtribute info, if the join expr hit the distribute 
info, we should use the distribute info to reduce the network consumption.
   
   ## What is bucket shuffle join
   
![image.png](https://upload-images.jianshu.io/upload_images/8552201-c383fe84aeee13bc.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
   
   just like Hive's bucket map join, the picture show how it work. if there a 
SQL A join B, and the join expr hit the distribute info of A. Bucket shuffle 
join only need distribute table B, sent the data to proper table A part. So the 
network cost is always  ```B```.
   
   So compared with the original join, obviously bucket shuffle join lead to 
less network cost:
  
B < min(3B, A + B)
   
   
   ### It can bring us the following benefits:
   
   
   1. First, Bucket Shuffle Join reduce the network cost and lead to a better 
performance for some join. Especially when the bucket is cropped.
   
   
   2. It does not strongly rely on the mechanism of collocate, so it is 
transparent to users. There is no mandatory requirement for data distribution, 
which will not lead to data skew.
   
   3. It can provide more query optimization space for join reorder.
   
   ## POC of Bucket Shuffle Join
   
   Now I've implemented a simple Bucket Shuffle join in Doris and test the 
performance of it.
   
   Now, we chose tpcds query 57. The query have 6 join operation, and 4 of them 
can hit Bucket shuffle join.
   
   | |  Origin Doris   | Bucket shuffle join  | 
   |  ::  | ::  |  ::  |
   | Time Cost | 27.7s  |  16.4s  |
   
   
   It seems to work as well as we expected. I'll do more experiments to verify 
its performance in the future
   
   
   ## Implementation
   
   1. First, we should add a partition type in thrift type
   
   2. FE able to plan and sense queries that can be used bucket shuffle join. 
send data distribution info to BE
   
   3. BE use the proper hash function to send proper data to proper instance of 
BE.
   
   
   
   
   
   
   
   
   
   
   
   
 

   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org