[GitHub] [incubator-doris] yangzhg merged pull request #4234: Update support batch delete storage design document
yangzhg merged pull request #4234: URL: https://github.com/apache/incubator-doris/pull/4234 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: Update support batch delete storage design document (#4234)
This is an automated email from the ASF dual-hosted git repository. yangzhg pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new 8a3eaee Update support batch delete storage design document (#4234) 8a3eaee is described below commit 8a3eaeecf12628d72342ed9d6e62da90092de7e6 Author: ZhangYu0123 <67053339+zhangyu0...@users.noreply.github.com> AuthorDate: Tue Aug 18 15:37:14 2020 +0800 Update support batch delete storage design document (#4234) * Update delete index design document --- docs/en/internal/doris_storage_optimization.md| 78 --- docs/zh-CN/internal/doris_storage_optimization.md | 38 +-- 2 files changed, 60 insertions(+), 56 deletions(-) diff --git a/docs/en/internal/doris_storage_optimization.md b/docs/en/internal/doris_storage_optimization.md index 529b85d..6ceccad 100644 --- a/docs/en/internal/doris_storage_optimization.md +++ b/docs/en/internal/doris_storage_optimization.md @@ -36,7 +36,7 @@ Documents include: - The file starts with an 8-byte magic code to identify the file format and version - Data Region: Used to store data information for each column, where the data is loaded on demand by pages. - Index Region: Doris stores the index data of each column in Index Region, where the data is loaded according to column granularity, so the data information of the following column is stored separately. -- Footer信息 +- Footer - FileFooterPB: Metadata Information for Definition Files - Chesum of 4 bytes of footer Pb content - Four bytes FileFooterPB message length for reading FileFooterPB @@ -116,27 +116,29 @@ We generate a sparse index of short key every N rows (configurable) with the con The format design supports the subsequent expansion of other index information, such as bitmap index, spatial index, etc. It only needs to write the required data to the existing column data, and add the corresponding metadata fields to FileFooterPB. ### Metadata Definition ### -FileFooterPB is defined as: +SegmentFooterPB is defined as: ``` message ColumnPB { -optional uint32 column_id = 1; // 这里使用column id,不使用column name是因为计划支持修改列名 -optional string type = 2; // 列类型 -optional string aggregation = 3; // 是否聚合 -optional uint32 length = 4; // 长度 -optional bool is_key = 5; // 是否是主键列 -optional string default_value = 6; // 默认值 -optional uint32 precision = 9 [default = 27]; // 精度 -optional uint32 frac = 10 [default = 9]; -optional bool is_nullable = 11 [default=false]; // 是否有null -optional bool is_bf_column = 15 [default=false]; // 是否有bf词典 - optional bool is_bitmap_column = 16 [default=false]; // 是否有bitmap索引 +required int32 unique_id = 1; // The column id is used here, and the column name is not used +optional string name = 2; // Column name, when name equals __DORIS_DELETE_SIGN__, this column is a hidden delete column +required string type = 3; // Column type +optional bool is_key = 4; // Whether column is a primary key column +optional string aggregation = 5;// Aggregate type +optional bool is_nullable = 6; // Whether column is allowed to assgin null +optional bytes default_value = 7; // Defalut value +optional int32 precision = 8; // Precision of column +optional int32 frac = 9; +optional int32 length = 10; // Length of column +optional int32 index_length = 11; // Length of column index +optional bool is_bf_column = 12;// Whether column has bloom filter index +optional bool has_bitmap_index = 15 [default=false]; // Whether column has bitmap index } -// page偏移 +// page offset message PagePointerPB { - required uint64 offset; // page在文件中的偏移 - required uint32 length; // page的大小 + required uint64 offset; // offset of page in segment file + required uint32 length; // length of page } message MetadataPairPB { @@ -145,36 +147,36 @@ message MetadataPairPB { } message ColumnMetaPB { - optional ColumnMessage encoding; // 编码方式 + optional ColumnMessage encoding; // Encoding of column - optional PagePointerPB dict_page // 词典page - repeated PagePointerPB bloom_filter_pages; // bloom filter词典信息 - optional PagePointerPB ordinal_index_page; // 行号索引数据 - optional PagePointerPB page_zone_map_page; // page级别统计信息索引数据 + optional PagePointerPB dict_page // Dictionary page + repeated PagePointerPB bloom_filter_pages; // Bloom filter pages + optional PagePointerPB ordinal_index_page; // Ordinal index page + optional PagePointerPB page_zone_map_page; // Page level of statistics index data - optional PagePointerPB bitmap_index_page; // bitmap索引数据 + optional PagePointerPB bitmap_index_page; // Bitmap index page - optional uint64 data_footprint; // 列中索引的大小 - opt
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471988812 ## File path: be/src/common/config.h ## @@ -265,6 +265,27 @@ namespace config { CONF_mInt64(base_compaction_interval_seconds_since_last_operation, "86400"); CONF_mInt32(base_compaction_write_mbytes_per_sec, "5"); +// config the cumulative compaction policy +// Valid configs: num_base, size_based +// num_based policy, the original version of cumulative compaction, cumulative version compaction once. +// size_based policy, a optimization version of cumulative compaction, targeting the use cases requiring +// lower write amplification, trading off read amplification and space amplification. +CONF_String(cumulative_compaction_policy, "num_based"); + +// In size_based policy, output rowset of cumulative compaction total disk size exceed this config size, +// this rowset will be given to base compaction, unit is m byte. +CONF_mInt64(cumulative_compaction_size_based_promotion_size_mbytes, "1024"); +// In size_based policy, output rowset of cumulative compaction total disk size exceed this config ratio of +// base rowset's total disk size, this rowset will be given to base compaction. The value must be between +// 0 and 1. +CONF_mDouble(cumulative_compaction_size_based_promotion_ratio, "0.05"); +// In size_based policy, the smallest size of rowset promotion. When the rowset is less than this config, this +// rowset will be not given to base compaction. The unit is m byte. +CONF_mInt64(cumulative_compaction_size_based_promotion_min_size_mbytes, "64"); +// The lower bound size to do cumulative compaction. When total disk size of candidate rowsets is less than +// this size, size_based policy also does cumulative compaction. The unit is m byte. + CONF_mInt64(cumulative_compaction_size_based_compaction_lower_bound_size_mbytes, "64"); Review comment: those config was too can we make it more shorter This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471990440 ## File path: be/src/olap/cumulative_compaction_policy.h ## @@ -0,0 +1,263 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H +#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H + +#include + +#include "olap/utils.h" +#include "olap/tablet.h" +#include "olap/tablet_meta.h" +#include "olap/rowset/rowset_meta.h" +#include "olap/rowset/rowset.h" + +namespace doris { + +class Tablet; + +/// This CompactionPolicyType enum is used to represent the type of compaction policy. +/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and CUMULATIVE_SIZE_BASED_POLICY. +/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by num based policy. +/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented by size_based policy. +enum CompactionPolicyType { Review comment: I think Policy and Type is duplicated in meaning This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471991211 ## File path: be/src/olap/cumulative_compaction_policy.h ## @@ -0,0 +1,263 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H +#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H + +#include + +#include "olap/utils.h" +#include "olap/tablet.h" +#include "olap/tablet_meta.h" +#include "olap/rowset/rowset_meta.h" +#include "olap/rowset/rowset.h" + +namespace doris { + +class Tablet; + +/// This CompactionPolicyType enum is used to represent the type of compaction policy. +/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and CUMULATIVE_SIZE_BASED_POLICY. +/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by num based policy. +/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented by size_based policy. +enum CompactionPolicyType { +CUMULATIVE_NUM_BASED_POLICY = 0, Review comment: you can use `NUM_BASED` directly, `CUMULATIVE` is in the class name, `POLICY` is in the enum name This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 edited a comment on issue #4370: Release Nodes 0.13.0
EmmyMiao87 edited a comment on issue #4370: URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-674835876 # Credits @ZhangYu0123 @wfjcmcb @Fullstop000 @sduzh @Stalary @worker24h @chaoyli @vagetablechicken @jmk1011 @funyeah @wutiangan @gengjun-git @xinghuayu007 @EmmyMiao87 @songenjie @acelyc111 @yangzhg @Seaven @hexian55 @ChenXiaofei @WingsGo @kangpinghuang @wangbo @weizuo93 @sdgshawn @skyduy @wyb @gaodayue @HappenLee @kangkaisen @wuyunfeng @HangyuanLiu @xy720 @liutang123 @caiconghui @liyuance @spaces-X @hffariel @decster @blackfox1983 @Astralidea @morningman @hf200012 @xbyang18 @Youngwb @imay @marising @caoyang10 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471992521 ## File path: be/src/olap/cumulative_compaction_policy.h ## @@ -0,0 +1,263 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H +#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H + +#include + +#include "olap/utils.h" +#include "olap/tablet.h" +#include "olap/tablet_meta.h" +#include "olap/rowset/rowset_meta.h" +#include "olap/rowset/rowset.h" + +namespace doris { + +class Tablet; + +/// This CompactionPolicyType enum is used to represent the type of compaction policy. +/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and CUMULATIVE_SIZE_BASED_POLICY. +/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by num based policy. +/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented by size_based policy. +enum CompactionPolicyType { +CUMULATIVE_NUM_BASED_POLICY = 0, +CUMULATIVE_SIZE_BASED_POLICY = 1, +}; + +const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED"; +const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED"; Review comment: same problem as above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471993152 ## File path: be/src/olap/cumulative_compaction_policy.h ## @@ -0,0 +1,263 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H +#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H + +#include + +#include "olap/utils.h" +#include "olap/tablet.h" +#include "olap/tablet_meta.h" +#include "olap/rowset/rowset_meta.h" +#include "olap/rowset/rowset.h" + +namespace doris { + +class Tablet; + +/// This CompactionPolicyType enum is used to represent the type of compaction policy. +/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and CUMULATIVE_SIZE_BASED_POLICY. +/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by num based policy. +/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented by size_based policy. +enum CompactionPolicyType { +CUMULATIVE_NUM_BASED_POLICY = 0, +CUMULATIVE_SIZE_BASED_POLICY = 1, +}; + +const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED"; +const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED"; +/// This class CumulativeCompactionPolicy is the base class of cumulative compaction policy. Review comment: why use `///` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] xy720 opened a new pull request #4383: [SparkLoad]Use the yarn command to get status and kill the application
xy720 opened a new pull request #4383: URL: https://github.com/apache/incubator-doris/pull/4383 ## Proposed changes #4346 #4203 This cl will use yarn command as follows to kill or get status of application running on YARN. ``` yarn --config confdir application <-kill | -status> ``` To do 1、 Make yarn command executable in spark load. 2、Write spark resource into config files and update it before running command. 3、Parse the result of executing the command line. ## Types of changes What types of changes does your code introduce to Doris? _Put an `x` in the boxes that apply_ - [x] Bugfix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) ## Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._ - [x] I have create an issue on (Fix #ISSUE), and have described the bug/feature there in detail - [] Compiling and unit tests pass locally with my changes - [] I have added tests that prove my fix is effective or that my feature works - [] If this change need a document change, I have updated the document - [] Any dependent changes have been merged ## Further comments If this is a relatively large or complex change, kick off the discussion at d...@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471994484 ## File path: be/src/olap/cumulative_compaction_policy.h ## @@ -0,0 +1,263 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H +#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H + +#include + +#include "olap/utils.h" +#include "olap/tablet.h" +#include "olap/tablet_meta.h" +#include "olap/rowset/rowset_meta.h" +#include "olap/rowset/rowset.h" + +namespace doris { + +class Tablet; + +/// This CompactionPolicyType enum is used to represent the type of compaction policy. +/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and CUMULATIVE_SIZE_BASED_POLICY. +/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by num based policy. +/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented by size_based policy. +enum CompactionPolicyType { +CUMULATIVE_NUM_BASED_POLICY = 0, +CUMULATIVE_SIZE_BASED_POLICY = 1, +}; + +const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED"; +const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED"; +/// This class CumulativeCompactionPolicy is the base class of cumulative compaction policy. +/// It defines the policy to do cumulative compaction. It has different derived classes, which implements +/// concrete cumulative compaction algorithm. The policy is configured by conf::cumulative_compaction_policy. +/// The policy functions is the main steps to do cumulative compaction. For example, how to pick candicate +/// rowsets from tablet using current policy, how to calculate the cumulative point and how to calculate +/// the tablet cumulative compcation score and so on. +class CumulativeCompactionPolicy { + +public: +/// Constructor function of CumulativeCompactionPolicy, +/// it needs tablet pointer to access tablet method. +/// param tablet, the shared pointer of tablet +CumulativeCompactionPolicy(std::shared_ptr tablet) : _tablet(tablet){} + +/// Destructor function of CumulativeCompactionPolicy. +virtual ~CumulativeCompactionPolicy() {} + +/// Calculate the cumulative compaction score of the tablet. This function uses rowsets meta and current +/// cumulative point to calculative the score of tablet. The score depends on the concrete algorithm of policy. +/// In general, the score represents the segments nums to do cumulative compaction in total rowsets. The more +/// score tablet gets, the earlier it can do cumulative compaction. +/// param all_rowsets, all rowsets in tablet. +/// param current_cumulative_point, current cumulative point value. +/// return score, the result score after calculate. +virtual void calc_cumulative_compaction_score( +const std::vector& all_rowsets, int64_t current_cumulative_point, +uint32_t* score) = 0; + +/// This function implements the policy which represents how to pick the candicate rowsets for compaction. +/// This base class gives a unified implementation. Its derived classes also can override this function each other. +/// param skip_window_sec, it means skipping the rowsets which use create time plus skip_window_sec is greater than now. +/// param rs_version_map, mapping from version to rowset +/// param cumulative_point, current cumulative point of tablet +/// return candidate_rowsets, the container of candidate rowsets +virtual void pick_candicate_rowsets( +int64_t skip_window_sec, +const std::unordered_map& rs_version_map, +int64_t cumulative_point, std::vector* candidate_rowsets); + +/// Pick input rowsets from candidate rowsets for compaction. This function is pure virtual function. +/// Its implemention depands on concrete compaction policy. +/// param candidate_rowsets, the candidate_rowsets vector container to pick input rowsets +/// return input_rowsets, the vector container as return +/// return last_delete_version, if has delete rowset, record the delete version from input_rowsets
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r471996935 ## File path: be/src/olap/olap_server.cpp ## @@ -277,6 +281,27 @@ void* StorageEngine::_disk_stat_monitor_thread_callback(void* arg) { return nullptr; } +void StorageEngine::_check_cumulative_compaction_config() { + +std::string cumulative_compaction_type = config::cumulative_compaction_policy; +boost::to_upper(cumulative_compaction_type); + +// if size_based policy is used, check size_based policy configs +if (cumulative_compaction_type == CUMULATIVE_SIZE_BASED_POLICY_TYPE) { +int64_t size_based_promotion_size = +config::cumulative_compaction_size_based_promotion_size_mbytes; +int64_t size_based_promotion_min_size = + config::cumulative_compaction_size_based_promotion_min_size_mbytes; +int64_t size_based_compaction_lower_bound_size = + config::cumulative_compaction_size_based_compaction_lower_bound_size_mbytes; + +// check size_based_promotion_size must be greater than size_based_promotion_min_size +CHECK(size_based_promotion_size >= size_based_promotion_min_size); Review comment: It is better to set to min size instead of usingf check here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated (8a3eaee -> 38a2a7a)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git. from 8a3eaee Update support batch delete storage design document (#4234) add 38a2a7a [Bug] Fix bug that modification of global variable can not be persisted. (#4324) No new revisions were added by this update. Summary of changes: .../java/org/apache/doris/analysis/SetVar.java | 23 ++- .../java/org/apache/doris/catalog/Catalog.java | 5 + .../org/apache/doris/common/FeMetaVersion.java | 4 +- .../org/apache/doris/common/util/TimeUtils.java| 6 +- .../org/apache/doris/journal/JournalEntity.java| 6 + .../java/org/apache/doris/persist/EditLog.java | 9 ++ .../apache/doris/persist/GlobalVarPersistInfo.java | 141 +++ .../org/apache/doris/persist/OperationType.java| 4 +- .../java/org/apache/doris/qe/GlobalVariable.java | 17 +++ .../main/java/org/apache/doris/qe/VariableMgr.java | 154 + ...InfoTest.java => GlobalVarPersistInfoTest.java} | 35 +++-- .../java/org/apache/doris/qe/VariableMgrTest.java | 66 + 12 files changed, 363 insertions(+), 107 deletions(-) create mode 100644 fe/fe-core/src/main/java/org/apache/doris/persist/GlobalVarPersistInfo.java copy fe/fe-core/src/test/java/org/apache/doris/persist/{AlterViewInfoTest.java => GlobalVarPersistInfoTest.java} (62%) - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman closed issue #4323: [Bug] Modification of global variables is not correctly persisted.
morningman closed issue #4323: URL: https://github.com/apache/incubator-doris/issues/4323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #4324: [Bug] Fix bug that modification of global variable can not be persisted.
morningman merged pull request #4324: URL: https://github.com/apache/incubator-doris/pull/4324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman closed issue #4344: [Bug]BE crash when doing LOADING phase of spark load
morningman closed issue #4344: URL: https://github.com/apache/incubator-doris/issues/4344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #4345: [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage
morningman merged pull request #4345: URL: https://github.com/apache/incubator-doris/pull/4345 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new e251080 [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345) e251080 is described below commit e25108097d349be877789ad82cf2568da37a9007 Author: Mingyu Chen AuthorDate: Tue Aug 18 16:54:55 2020 +0800 [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345) After PR: #4135, If a mem tracker has parent, it should be created by 'CreateTracker'. So I removed other unused constructors. And also fix the bug described in #4344 --- be/src/exec/parquet_scanner.cpp | 1 - be/src/exec/parquet_scanner.h | 1 - be/src/olap/push_handler.cpp| 2 +- be/src/olap/push_handler.h | 2 +- be/src/runtime/mem_tracker.cpp | 58 ++--- be/src/runtime/mem_tracker.h| 42 ++--- 6 files changed, 40 insertions(+), 66 deletions(-) diff --git a/be/src/exec/parquet_scanner.cpp b/be/src/exec/parquet_scanner.cpp index d2e69e9..2db36f3 100644 --- a/be/src/exec/parquet_scanner.cpp +++ b/be/src/exec/parquet_scanner.cpp @@ -18,7 +18,6 @@ #include "exec/parquet_scanner.h" #include "runtime/descriptors.h" #include "runtime/exec_env.h" -#include "runtime/mem_tracker.h" #include "runtime/raw_value.h" #include "runtime/stream_load/load_stream_mgr.h" #include "runtime/stream_load/stream_load_pipe.h" diff --git a/be/src/exec/parquet_scanner.h b/be/src/exec/parquet_scanner.h index a052e65..09d92ff 100644 --- a/be/src/exec/parquet_scanner.h +++ b/be/src/exec/parquet_scanner.h @@ -42,7 +42,6 @@ class ExprContext; class TupleDescriptor; class TupleRow; class RowDescriptor; -class MemTracker; class RuntimeProfile; class StreamLoadPipe; diff --git a/be/src/olap/push_handler.cpp b/be/src/olap/push_handler.cpp index fa5c6bd..a5e9b1c 100644 --- a/be/src/olap/push_handler.cpp +++ b/be/src/olap/push_handler.cpp @@ -946,7 +946,7 @@ OLAPStatus PushBrokerReader::init(const Schema* schema, } _runtime_profile = _runtime_state->runtime_profile(); _runtime_profile->set_name("PushBrokerReader"); -_mem_tracker.reset(new MemTracker(_runtime_profile, -1, _runtime_profile->name(), _runtime_state->instance_mem_tracker())); +_mem_tracker = MemTracker::CreateTracker(-1, "PushBrokerReader", _runtime_state->instance_mem_tracker()); _mem_pool.reset(new MemPool(_mem_tracker.get())); _counter.reset(new ScannerCounter()); diff --git a/be/src/olap/push_handler.h b/be/src/olap/push_handler.h index 181905d..3a3a319 100644 --- a/be/src/olap/push_handler.h +++ b/be/src/olap/push_handler.h @@ -248,7 +248,7 @@ private: const Schema* _schema; std::unique_ptr _runtime_state; RuntimeProfile* _runtime_profile; -std::unique_ptr _mem_tracker; +std::shared_ptr _mem_tracker; std::unique_ptr _mem_pool; std::unique_ptr _counter; std::unique_ptr _scanner; diff --git a/be/src/runtime/mem_tracker.cpp b/be/src/runtime/mem_tracker.cpp index 5e3c90b..f52befd 100644 --- a/be/src/runtime/mem_tracker.cpp +++ b/be/src/runtime/mem_tracker.cpp @@ -70,7 +70,7 @@ static std::shared_ptr root_tracker; static GoogleOnceType root_tracker_once = GOOGLE_ONCE_INIT; void MemTracker::CreateRootTracker() { - root_tracker.reset(new MemTracker(-1, "root", std::shared_ptr())); + root_tracker.reset(new MemTracker(-1, "root")); root_tracker->Init(); } @@ -85,7 +85,7 @@ std::shared_ptr MemTracker::CreateTracker( } else { real_parent = GetRootTracker(); } - shared_ptr tracker(new MemTracker(byte_limit, label, real_parent, log_usage_if_zero)); + shared_ptr tracker(new MemTracker(nullptr, byte_limit, label, real_parent, log_usage_if_zero)); real_parent->AddChildTracker(tracker); tracker->Init(); @@ -102,56 +102,36 @@ std::shared_ptr MemTracker::CreateTracker( } else { real_parent = GetRootTracker(); } - shared_ptr tracker(new MemTracker(profile, byte_limit, label, real_parent)); + shared_ptr tracker(new MemTracker(profile, byte_limit, label, real_parent, true)); real_parent->AddChildTracker(tracker); tracker->Init(); return tracker; } +MemTracker::MemTracker(int64_t byte_limit, const std::string& label) : +MemTracker(nullptr, byte_limit, label, std::shared_ptr(), true) { +} + MemTracker::MemTracker( +RuntimeProfile* profile, int64_t byte_limit, const string& label, const std::shared_ptr& parent, bool log_usage_if_zero) : limit_(byte_limit), soft_limit_(CalcSoftLimit(byte_limit)), label_(label), parent_(parent), - consumption_(std::make_shared(TUnit::BYTES)), consumption_metric_(nullptr), log_usage_if_zero_(log_usage_if_zero), num_gcs_metric_(nullptr), bytes_freed_by_last_gc_metric_(nullptr),
[GitHub] [incubator-doris] morningman merged pull request #4327: [Metrics] Support tablet level metrics
morningman merged pull request #4327: URL: https://github.com/apache/incubator-doris/pull/4327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated (e251080 -> 56260a6)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git. from e251080 [Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345) add 56260a6 [Metrics] Support tablet level metrics (#4327) No new revisions were added by this update. Summary of changes: be/src/exec/olap_scanner.cpp | 3 +++ be/src/http/action/metrics_action.cpp | 5 +++-- be/src/http/action/stream_load.cpp | 2 +- be/src/olap/base_tablet.cpp| 16 +++- be/src/olap/base_tablet.h | 7 +++ be/src/olap/data_dir.cpp | 2 +- be/src/olap/delta_writer.cpp | 6 +- be/src/olap/memtable_flush_executor.h | 2 +- be/src/olap/tablet.cpp | 6 ++ be/src/olap/tablet.h | 4 be/src/util/metrics.cpp| 14 ++ be/src/util/metrics.h | 18 +- be/test/util/new_metrics_test.cpp | 4 ++-- .../operation/monitor-metrics/be-metrics.md| 9 +++-- .../operation/monitor-metrics/be-metrics.md| 9 +++-- 15 files changed, 85 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new issue #4384: [Bug][SparkLoad] Spark load will create rowset with incorrect rowset type
morningman opened a new issue #4384: URL: https://github.com/apache/incubator-doris/issues/4384 **Describe the bug** 1. create a table with segment v2 format 2. load data with spark load 3. the rowset with version [2-2] is with storage format SegmentV1, which is expected to be V2. **Why** the push handler only set the rowset writer's rowset type by the config of `default_rowset_type` of BE, not checking the `prefer_rowset_type` of the tablet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new issue #4385: [Bug] tablet type in tablet meta is wrong
morningman opened a new issue #4385: URL: https://github.com/apache/incubator-doris/issues/4385 **Describe the bug** The `tablet_type` in tablet meta should be `TABLET_TYPE_DISK`. But it is set to `TABLET_TYPE_MEMORY` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new issue #4386: [SegmentV2] Change the default storage format to SegmentV2
morningman opened a new issue #4386: URL: https://github.com/apache/incubator-doris/issues/4386 **Is your feature request related to a problem? Please describe.** SInce the Segment V2 has been released for a long time, we should make it as default storage format for newly created table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #4387: [SegmentV2] Change the default storage format to SegmentV2
morningman opened a new pull request #4387: URL: https://github.com/apache/incubator-doris/pull/4387 ## Proposed changes SInce the Segment V2 has been released for a long time, we should make it as default storage format for newly created table. This CL mainly changes: 1. For all newly created tables, their default storage format is Segment V2. 2. For all already exist tablets, their storage format remain unchanged. 3. Fix bugs described in Fix #4384 and Fix #4385 ## Types of changes - [x] Bugfix (non-breaking change which fixes an issue) - [x] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Documentation Update (if none of the other choices apply) ## Checklist - [x] I have create an issue on (Fix #4386), and have described the bug/feature there in detail - [x] Compiling and unit tests pass locally with my changes - [x] I have added tests that prove my fix is effective or that my feature works - [x] If this change need a document change, I have updated the document - [x] Any dependent changes have been merged ## Further comments We should provide a more friendly way to check the conversion progress of Segment V2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HangyuanLiu opened a new pull request #4388: Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure
HangyuanLiu opened a new pull request #4388: URL: https://github.com/apache/incubator-doris/pull/4388 ## Proposed changes In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality. Add an error status code :` OLAP_ERR_DATE_QUALITY_ERR ` to determine if a data problem is causing the failure ## Types of changes What types of changes does your code introduce to Doris? _Put an `x` in the boxes that apply_ - [] Bugfix (non-breaking change which fixes an issue) - [] New feature (non-breaking change which adds functionality) - [] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [] Documentation Update (if none of the other choices apply) - [] Code refactor (Modify the code structure, format the code, etc...) ## Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._ - [] I have create an issue on (Fix #ISSUE), and have described the bug/feature there in detail - [] Compiling and unit tests pass locally with my changes - [] I have added tests that prove my fix is effective or that my feature works - [] If this change need a document change, I have updated the document - [] Any dependent changes have been merged ## Further comments If this is a relatively large or complex change, kick off the discussion at d...@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on pull request #4366: Optimise coding bit operation in BE
imay commented on pull request #4366: URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675370532 > > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks? > > In my develop computure, when encode_varint64 execute 1 billion times. `v | B` version averagely uses 95095ms in 5 times. > `(v & (B - 1))` version averagely uses 96103ms in 5 times. It can improve aboat 0.5% ~ 1%. Encode_varint64 is used high frequency in many cases like bitmap_value, page_pointer encode and so on. It seems too slow to execute 5 bilion enocde operation in about 95s This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee opened a new pull request #4389: [ODBC SCAN NODE] 2/4 Add Thrift Interface and Meta of ODBC_Scan_Node
HappenLee opened a new pull request #4389: URL: https://github.com/apache/incubator-doris/pull/4389 issue:#4376 ## Proposed changes Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue. ## Types of changes What types of changes does your code introduce to Doris? _Put an `x` in the boxes that apply_ - [] Bugfix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [] Documentation Update (if none of the other choices apply) - [] Code refactor (Modify the code structure, format the code, etc...) ## Checklist _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._ - [x] I have create an issue on (Fix #ISSUE), and have described the bug/feature there in detail - [x] Compiling and unit tests pass locally with my changes - [x] I have added tests that prove my fix is effective or that my feature works - [x] If this change need a document change, I have updated the document - [x] Any dependent changes have been merged This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #4253: Support more materialized view syntax
EmmyMiao87 commented on pull request #4253: URL: https://github.com/apache/incubator-doris/pull/4253#issuecomment-675379372 `To compatibility with the old rollup logic, the syntax "DROP MATERIALIZED VIEW [ IF EXISTS ] [db_name].< mv_name > FROM [db].[table]" has been added` Both FROM and ON should be supported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #4253: Support more materialized view syntax
EmmyMiao87 commented on pull request #4253: URL: https://github.com/apache/incubator-doris/pull/4253#issuecomment-675380536 The syntax of `SHOW`, `ALTER`, `DROP` should be consistent This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] shuaijinchao opened a new issue #4390: [Document] increase mailing list subscription method.
shuaijinchao opened a new issue #4390: URL: https://github.com/apache/incubator-doris/issues/4390 `Email` is an important communication method in the `Apache` project. I think the way to subscribe to the developer mailing list should be put in the README so that more people can see and participate in the project discussion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] shuaijinchao opened a new pull request #4391: doc: increase mailing list subscription method.
shuaijinchao opened a new pull request #4391: URL: https://github.com/apache/incubator-doris/pull/4391 ## Proposed changes FIX #4390 ## Types of changes - [x] Documentation Update (increase mailing list subscription method) ## Checklist - [x] I have create an issue on #4390 , and have described the bug/feature there in detail - [x] If this change need a document change, I have updated the document. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #4378: FIX: fix dynamic partition replicationNum error
morningman commented on a change in pull request #4378: URL: https://github.com/apache/incubator-doris/pull/4378#discussion_r472084694 ## File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java ## @@ -3973,43 +3974,55 @@ public static void getDdlStmt(Table table, List createTableStmt, List bfColumnNames = olapTable.getCopiedBfColumns(); if (bfColumnNames != null) { - sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_BF_COLUMNS).append("\" = \""); -sb.append(Joiner.on(", ").join(olapTable.getCopiedBfColumns())).append("\""); +appendProperties(sb, PropertyAnalyzer.PROPERTIES_BF_COLUMNS, Joiner.on(", ").join(olapTable.getCopiedBfColumns())); } if (separatePartition) { // version info - sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_VERSION_INFO).append("\" = \""); -Partition partition = null; +Partition partition; if (olapTable.getPartitionInfo().getType() == PartitionType.UNPARTITIONED) { partition = olapTable.getPartition(olapTable.getName()); } else { Preconditions.checkState(partitionId.size() == 1); partition = olapTable.getPartition(partitionId.get(0)); } -sb.append(Joiner.on(",").join(partition.getVisibleVersion(), partition.getVisibleVersionHash())) -.append("\""); +appendProperties(sb, PropertyAnalyzer.PROPERTIES_VERSION_INFO, Joiner.on(",").join(partition.getVisibleVersion(), partition.getVisibleVersionHash())); } // colocateTable String colocateTable = olapTable.getColocateGroup(); if (colocateTable != null) { - sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_COLOCATE_WITH).append("\" = \""); -sb.append(colocateTable).append("\""); +appendProperties(sb, PropertyAnalyzer.PROPERTIES_COLOCATE_WITH, colocateTable); } // dynamic partition if (olapTable.dynamicPartitionExists()) { - sb.append(olapTable.getTableProperty().getDynamicPartitionProperty().toString()); +DynamicPartitionProperty dynamicPartitionProperty = olapTable.getTableProperty().getDynamicPartitionProperty(); +appendProperties(sb, DynamicPartitionProperty.ENABLE, dynamicPartitionProperty.getEnable()); Review comment: This is not a good implments. Could you move these `appendProperties` into a method of `DynamicPartitionProperty`. So that if we add more properties in future, we only need to modify one place. You can pass the table's default replication number to that method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] ZhangYu0123 commented on pull request #4366: Optimise coding bit operation in BE
ZhangYu0123 commented on pull request #4366: URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675405126 > > > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks? > > > > > > In my develop computure, when encode_varint64 execute 1 billion times. `v | B` version averagely uses 95095ms in 5 times. > > `(v & (B - 1))` version averagely uses 96103ms in 5 times. It can improve aboat 0.5% ~ 1%. Encode_varint64 is used high frequency in many cases like bitmap_value, page_pointer encode and so on. > > It seems too slow to execute 5 bilion enocde operation in about 95s 5 billion costs 95s * 5. Compression is time consuming. This encode_varint64 is mainly used to compress low-bit int to variable length instead of int64_t type. It is trade-off between time and space. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] stalary commented on a change in pull request #4378: FIX: fix dynamic partition replicationNum error
stalary commented on a change in pull request #4378: URL: https://github.com/apache/incubator-doris/pull/4378#discussion_r472088500 ## File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java ## @@ -3973,43 +3974,55 @@ public static void getDdlStmt(Table table, List createTableStmt, List bfColumnNames = olapTable.getCopiedBfColumns(); if (bfColumnNames != null) { - sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_BF_COLUMNS).append("\" = \""); -sb.append(Joiner.on(", ").join(olapTable.getCopiedBfColumns())).append("\""); +appendProperties(sb, PropertyAnalyzer.PROPERTIES_BF_COLUMNS, Joiner.on(", ").join(olapTable.getCopiedBfColumns())); } if (separatePartition) { // version info - sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_VERSION_INFO).append("\" = \""); -Partition partition = null; +Partition partition; if (olapTable.getPartitionInfo().getType() == PartitionType.UNPARTITIONED) { partition = olapTable.getPartition(olapTable.getName()); } else { Preconditions.checkState(partitionId.size() == 1); partition = olapTable.getPartition(partitionId.get(0)); } -sb.append(Joiner.on(",").join(partition.getVisibleVersion(), partition.getVisibleVersionHash())) -.append("\""); +appendProperties(sb, PropertyAnalyzer.PROPERTIES_VERSION_INFO, Joiner.on(",").join(partition.getVisibleVersion(), partition.getVisibleVersionHash())); } // colocateTable String colocateTable = olapTable.getColocateGroup(); if (colocateTable != null) { - sb.append(",\n\"").append(PropertyAnalyzer.PROPERTIES_COLOCATE_WITH).append("\" = \""); -sb.append(colocateTable).append("\""); +appendProperties(sb, PropertyAnalyzer.PROPERTIES_COLOCATE_WITH, colocateTable); } // dynamic partition if (olapTable.dynamicPartitionExists()) { - sb.append(olapTable.getTableProperty().getDynamicPartitionProperty().toString()); +DynamicPartitionProperty dynamicPartitionProperty = olapTable.getTableProperty().getDynamicPartitionProperty(); +appendProperties(sb, DynamicPartitionProperty.ENABLE, dynamicPartitionProperty.getEnable()); Review comment: okay,I will modify it later This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #4383: [SparkLoad]Use the yarn command to get status and kill the application
morningman commented on a change in pull request #4383: URL: https://github.com/apache/incubator-doris/pull/4383#discussion_r472090745 ## File path: fe/fe-core/src/main/java/org/apache/doris/load/loadv2/YarnApplicationReport.java ## @@ -0,0 +1,121 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.load.loadv2; + +import org.apache.doris.common.LoadException; +import com.google.common.base.Preconditions; +import com.google.common.base.Splitter; +import com.google.common.collect.Maps; + +import org.apache.hadoop.yarn.api.records.ApplicationReport; +import org.apache.hadoop.yarn.api.records.FinalApplicationStatus; +import org.apache.hadoop.yarn.api.records.YarnApplicationState; +import org.apache.hadoop.yarn.api.records.impl.pb.ApplicationReportPBImpl; +import org.apache.hadoop.yarn.util.ConverterUtils; + +import java.text.NumberFormat; +import java.text.ParseException; +import java.util.List; +import java.util.Map; + +/** + * Covert output string of command `yarn application -status` to application report. + * Input sample: + * --- + * Application Report : + * Application-Id : application_1573630236805_6763648 + * Application-Name : doris_label_test + * Application-Type : SPARK-2.4.1 + * User : test + * Queue : test-queue + * Start-Time : 1597654469958 + * Finish-Time : 1597654801939 + * Progress : 100% + * State : FINISHED + * Final-State : SUCCEEDED + * Tracking-URL : 127.0.0.1:8004/history/application_1573630236805_6763648/1 + * RPC Port : 40236 + * AM Host : host-name + * -- + * + * Output: + * ApplicationReport + */ +public class YarnApplicationReport { +private static final String APPLICATION_ID = "Application-Id"; +private static final String APPLICATION_TYPE = "Application-Type"; +private static final String APPLICATION_NAME = "Application-Name"; +private static final String USER = "User"; +private static final String QUEUE = "Queue"; +private static final String START_TIME = "Start-Time"; +private static final String FINISH_TIME = "Finish-Time"; +private static final String PROGRESS = "Progress"; +private static final String STATE = "State"; +private static final String FINAL_STATE = "Final-State"; +private static final String TRACKING_URL = "Tracking-URL"; +private static final String RPC_PORT = "RPC Port"; +private static final String AM_HOST = "AM Host"; +private static final String DIAGNOSTICS = "Diagnostics"; + +private ApplicationReport report; + +public YarnApplicationReport(String output) throws LoadException { +this.report = new ApplicationReportPBImpl(); +parseFromOutput(output); +} + +public ApplicationReport getReport() { +return report; +} + +private void parseFromOutput(String output) throws LoadException { +Map reportMap = Maps.newHashMap(); +List lines = Splitter.onPattern("\n").trimResults().splitToList(output); +// Application-Id : application_1573630236805_6763648 ==> (Application-Id, application_1573630236805_6763648) +for (String line : lines) { +List entry = Splitter.onPattern(":").limit(2).trimResults().splitToList(line); +Preconditions.checkState(entry.size() <= 2); Review comment: Preconditions.checkState(entry.size() <= 2, line); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #4212: Compaction rules optimization
yangzhg commented on a change in pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212#discussion_r472006650 ## File path: be/src/olap/cumulative_compaction_policy.h ## @@ -0,0 +1,263 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H +#define DORIS_BE_SRC_OLAP_CUMULATIVE_COMPACTION_POLICY_H + +#include + +#include "olap/utils.h" +#include "olap/tablet.h" +#include "olap/tablet_meta.h" +#include "olap/rowset/rowset_meta.h" +#include "olap/rowset/rowset.h" + +namespace doris { + +class Tablet; + +/// This CompactionPolicyType enum is used to represent the type of compaction policy. +/// Now it has two values, CUMULATIVE_NUM_BASED_POLICY and CUMULATIVE_SIZE_BASED_POLICY. +/// CUMULATIVE_NUM_BASED_POLICY means current compaction policy implemented by num based policy. +/// CUMULATIVE_SIZE_BASED_POLICY means current comapction policy implemented by size_based policy. +enum CompactionPolicyType { +CUMULATIVE_NUM_BASED_POLICY = 0, +CUMULATIVE_SIZE_BASED_POLICY = 1, +}; + +const static std::string CUMULATIVE_NUM_BASED_POLICY_TYPE = "NUM_BASED"; +const static std::string CUMULATIVE_SIZE_BASED_POLICY_TYPE = "SIZE_BASED"; +/// This class CumulativeCompactionPolicy is the base class of cumulative compaction policy. +/// It defines the policy to do cumulative compaction. It has different derived classes, which implements +/// concrete cumulative compaction algorithm. The policy is configured by conf::cumulative_compaction_policy. +/// The policy functions is the main steps to do cumulative compaction. For example, how to pick candicate +/// rowsets from tablet using current policy, how to calculate the cumulative point and how to calculate +/// the tablet cumulative compcation score and so on. +class CumulativeCompactionPolicy { + +public: +/// Constructor function of CumulativeCompactionPolicy, +/// it needs tablet pointer to access tablet method. +/// param tablet, the shared pointer of tablet +CumulativeCompactionPolicy(std::shared_ptr tablet) : _tablet(tablet){} + +/// Destructor function of CumulativeCompactionPolicy. +virtual ~CumulativeCompactionPolicy() {} + +/// Calculate the cumulative compaction score of the tablet. This function uses rowsets meta and current +/// cumulative point to calculative the score of tablet. The score depends on the concrete algorithm of policy. +/// In general, the score represents the segments nums to do cumulative compaction in total rowsets. The more +/// score tablet gets, the earlier it can do cumulative compaction. +/// param all_rowsets, all rowsets in tablet. +/// param current_cumulative_point, current cumulative point value. +/// return score, the result score after calculate. +virtual void calc_cumulative_compaction_score( +const std::vector& all_rowsets, int64_t current_cumulative_point, +uint32_t* score) = 0; + +/// This function implements the policy which represents how to pick the candicate rowsets for compaction. +/// This base class gives a unified implementation. Its derived classes also can override this function each other. +/// param skip_window_sec, it means skipping the rowsets which use create time plus skip_window_sec is greater than now. +/// param rs_version_map, mapping from version to rowset +/// param cumulative_point, current cumulative point of tablet +/// return candidate_rowsets, the container of candidate rowsets +virtual void pick_candicate_rowsets( +int64_t skip_window_sec, +const std::unordered_map& rs_version_map, +int64_t cumulative_point, std::vector* candidate_rowsets); + +/// Pick input rowsets from candidate rowsets for compaction. This function is pure virtual function. +/// Its implemention depands on concrete compaction policy. +/// param candidate_rowsets, the candidate_rowsets vector container to pick input rowsets +/// return input_rowsets, the vector container as return +/// return last_delete_version, if has delete rowset, record the delete version from input_rowsets
[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #4253: Support more materialized view syntax
EmmyMiao87 commented on a change in pull request #4253: URL: https://github.com/apache/incubator-doris/pull/4253#discussion_r472123863 ## File path: fe/fe-core/src/main/java/org/apache/doris/analysis/DropMaterializedViewStmt.java ## @@ -38,38 +43,91 @@ */ public class DropMaterializedViewStmt extends DdlStmt { -private String mvName; -private TableName tableName; private boolean ifExists; +private final TableName dbMvName; +private final TableName dbTblName; -public DropMaterializedViewStmt(boolean ifExists, String mvName, TableName tableName) { -this.mvName = mvName; -this.tableName = tableName; +public DropMaterializedViewStmt(boolean ifExists, TableName dbMvName, TableName dbTblName) { this.ifExists = ifExists; +this.dbMvName = dbMvName; +this.dbTblName = dbTblName; +} + +public boolean isSetIfExists() { +return ifExists; } public String getMvName() { -return mvName; +return dbMvName.getTbl(); } -public TableName getTableName() { -return tableName; +public String getTblName() { +if (dbTblName != null) { +return dbTblName.getTbl(); +} else { +return null; +} } -public boolean isIfExists() { -return ifExists; +public String getDbName() { +if (dbTblName != null) { +return dbTblName.getDb(); +} else { +return dbMvName.getDb(); +} } @Override public void analyze(Analyzer analyzer) throws UserException { -if (Strings.isNullOrEmpty(mvName)) { -throw new AnalysisException("The materialized name could not be empty or null."); +if (dbTblName != null && !Strings.isNullOrEmpty(dbMvName.getDb())) { +throw new AnalysisException("Syntax drop materialized view [mv-name] from db.name mush specify database name explicitly in `from`"); Review comment: If the dbTableName is different from dbMvName.getDb(), the Doris will thrown Exception. ## File path: fe/fe-core/src/main/java/org/apache/doris/analysis/DropMaterializedViewStmt.java ## @@ -38,38 +43,91 @@ */ public class DropMaterializedViewStmt extends DdlStmt { -private String mvName; -private TableName tableName; private boolean ifExists; +private final TableName dbMvName; +private final TableName dbTblName; -public DropMaterializedViewStmt(boolean ifExists, String mvName, TableName tableName) { -this.mvName = mvName; -this.tableName = tableName; +public DropMaterializedViewStmt(boolean ifExists, TableName dbMvName, TableName dbTblName) { this.ifExists = ifExists; +this.dbMvName = dbMvName; +this.dbTblName = dbTblName; +} + +public boolean isSetIfExists() { +return ifExists; } public String getMvName() { -return mvName; +return dbMvName.getTbl(); } -public TableName getTableName() { -return tableName; +public String getTblName() { +if (dbTblName != null) { +return dbTblName.getTbl(); +} else { +return null; +} } -public boolean isIfExists() { -return ifExists; +public String getDbName() { +if (dbTblName != null) { +return dbTblName.getDb(); +} else { +return dbMvName.getDb(); +} } @Override public void analyze(Analyzer analyzer) throws UserException { -if (Strings.isNullOrEmpty(mvName)) { -throw new AnalysisException("The materialized name could not be empty or null."); +if (dbTblName != null && !Strings.isNullOrEmpty(dbMvName.getDb())) { +throw new AnalysisException("Syntax drop materialized view [mv-name] from db.name mush specify database name explicitly in `from`"); +} +if (dbTblName != null) { +if (!Strings.isNullOrEmpty(dbMvName.getDb())) { +throw new AnalysisException("If the database appears after the from statement, " + Review comment: What's the difference between here and above? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #4392: [Bug] Remove RECOVER_TABLET worker pool to make ASAN compile happy
morningman opened a new pull request #4392: URL: https://github.com/apache/incubator-doris/pull/4392 In PR #4255, I missed to remove some code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581
marising commented on a change in pull request #4330: URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472246168 ## File path: fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionCache.java ## @@ -0,0 +1,215 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.qe.cache; + +import com.google.common.collect.Lists; +import org.apache.doris.analysis.CompoundPredicate; +import org.apache.doris.analysis.Expr; +import org.apache.doris.analysis.InlineViewRef; +import org.apache.doris.analysis.QueryStmt; +import org.apache.doris.analysis.SelectStmt; +import org.apache.doris.analysis.TableRef; +import org.apache.doris.catalog.Column; +import org.apache.doris.catalog.OlapTable; +import org.apache.doris.catalog.RangePartitionInfo; +import org.apache.doris.common.Status; +import org.apache.doris.common.util.DebugUtil; +import org.apache.doris.metric.MetricRepo; +import org.apache.doris.qe.RowBatch; +import org.apache.doris.thrift.TUniqueId; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.util.List; + +public class PartitionCache extends Cache { +private static final Logger LOG = LogManager.getLogger(PartitionCache.class); +private SelectStmt nokeyStmt; Review comment: After rewriting, there is no partition key select statement This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581
marising commented on a change in pull request #4330: URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472259742 ## File path: fe/fe-core/src/main/java/org/apache/doris/qe/cache/CacheAnalyzer.java ## @@ -0,0 +1,450 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.qe.cache; + +import org.apache.doris.analysis.AggregateInfo; +import org.apache.doris.analysis.BinaryPredicate; +import org.apache.doris.analysis.CastExpr; +import org.apache.doris.analysis.CompoundPredicate; +import org.apache.doris.analysis.Expr; +import org.apache.doris.analysis.InlineViewRef; +import org.apache.doris.analysis.QueryStmt; +import org.apache.doris.analysis.SelectStmt; +import org.apache.doris.analysis.SlotRef; +import org.apache.doris.analysis.StatementBase; +import org.apache.doris.analysis.TableRef; +import org.apache.doris.catalog.OlapTable; +import org.apache.doris.catalog.RangePartitionInfo; +import org.apache.doris.catalog.PartitionType; +import org.apache.doris.catalog.Partition; +import org.apache.doris.catalog.Column; +import org.apache.doris.common.util.DebugUtil; +import org.apache.doris.metric.MetricRepo; +import org.apache.doris.planner.OlapScanNode; +import org.apache.doris.planner.Planner; +import org.apache.doris.planner.ScanNode; +import org.apache.doris.qe.ConnectContext; +import org.apache.doris.qe.RowBatch; +import org.apache.doris.common.Config; +import org.apache.doris.common.Status; + +import com.google.common.collect.Lists; +import org.apache.doris.thrift.TUniqueId; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +/** + * Analyze which caching mode a SQL is suitable for + * 1. T + 1 update is suitable for SQL mode + * 2. Partition by date, update the data of the day in near real time, which is suitable for Partition mode + */ +public class CacheAnalyzer { +private static final Logger LOG = LogManager.getLogger(CacheAnalyzer.class); + +/** + * NoNeed : disable config or variable, not query, not scan table etc. + */ +public enum CacheMode { +NoNeed, +None, +TTL, +Sql, +Partition +} + +private ConnectContext context; +private boolean enableSqlCache = false; +private boolean enablePartitionCache = false; +private TUniqueId queryId; +private CacheMode cacheMode; +private CacheTable latestTable; +private StatementBase parsedStmt; +private SelectStmt selectStmt; +private List scanNodes; +private OlapTable olapTable; +private RangePartitionInfo partitionInfo; +private Column partColumn; +private CompoundPredicate partitionPredicate; +private Cache cache; + +public Cache getCache() { +return cache; +} + +public CacheAnalyzer(ConnectContext context, StatementBase parsedStmt, Planner planner) { +this.context = context; +this.queryId = context.queryId(); +this.parsedStmt = parsedStmt; +scanNodes = planner.getScanNodes(); +latestTable = new CacheTable(); +checkCacheConfig(); +} + +//for unit test +public CacheAnalyzer(ConnectContext context, StatementBase parsedStmt, List scanNodes) { +this.context = context; +this.parsedStmt = parsedStmt; +this.scanNodes = scanNodes; +checkCacheConfig(); +} + +private void checkCacheConfig() { +if (Config.cache_enable_sql_mode) { +if (context.getSessionVariable().isEnableSqlCache()) { Review comment: I understand that getsessionvariable() can obtain session variables and global variables. Session variables have higher priority than global variables. I don't know if I understand correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org ---
[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581
marising commented on a change in pull request #4330: URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472266869 ## File path: fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java ## @@ -575,6 +583,78 @@ private void handleSetStmt() { context.getState().setOk(); } +private void sendChannel(MysqlChannel channel, List cacheValues, boolean hitAll) Review comment: This means whether the query partitions are all hit,so isHitAll is better? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581
marising commented on a change in pull request #4330: URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472277065 ## File path: fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionRange.java ## @@ -0,0 +1,596 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.qe.cache; + +import org.apache.doris.analysis.CompoundPredicate; +import org.apache.doris.analysis.BinaryPredicate; +import org.apache.doris.analysis.DateLiteral; +import org.apache.doris.analysis.InPredicate; +import org.apache.doris.analysis.PartitionValue; +import org.apache.doris.analysis.Expr; +import org.apache.doris.analysis.LiteralExpr; +import org.apache.doris.analysis.IntLiteral; +import org.apache.doris.catalog.OlapTable; +import org.apache.doris.catalog.PrimitiveType; +import org.apache.doris.catalog.RangePartitionInfo; +import org.apache.doris.catalog.Column; +import org.apache.doris.catalog.Partition; +import org.apache.doris.catalog.PartitionKey; +import org.apache.doris.catalog.Type; +import org.apache.doris.common.Config; +import org.apache.doris.planner.PartitionColumnFilter; + +import org.apache.doris.common.AnalysisException; + +import com.google.common.collect.Lists; +import com.google.common.collect.Range; + +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.List; +import java.util.Map; + +/** + * Convert the range of the partition to the list + * all partition by day/week/month split to day list + */ +public class PartitionRange { +private static final Logger LOG = LogManager.getLogger(PartitionRange.class); + +public class PartitionSingle { +private Partition partition; +private PartitionKey partitionKey; +private long partitionId; +private PartitionKeyType cacheKey; +private boolean fromCache; +private boolean tooNew; + +public Partition getPartition() { +return partition; +} + +public void setPartition(Partition partition) { +this.partition = partition; +} + +public PartitionKey getPartitionKey() { +return partitionKey; +} + +public void setPartitionKey(PartitionKey key) { +this.partitionKey = key; +} + +public long getPartitionId() { +return partitionId; +} + +public void setPartitionId(long partitionId) { +this.partitionId = partitionId; +} + +public PartitionKeyType getCacheKey() { +return cacheKey; +} + +public void setCacheKey(PartitionKeyType cacheKey) { +this.cacheKey.clone(cacheKey); +} + +public boolean isFromCache() { +return fromCache; +} + +public void setFromCache(boolean fromCache) { +this.fromCache = fromCache; +} + +public boolean isTooNew() { +return tooNew; +} + +public void setTooNew(boolean tooNew) { +this.tooNew = tooNew; +} + +public PartitionSingle() { +this.partitionId = 0; +this.cacheKey = new PartitionKeyType(); +this.fromCache = false; +this.tooNew = false; +} + +public void Debug() { +if (partition != null) { +LOG.info("partition id {}, cacheKey {}, version {}, time {}, fromCache {}, tooNew {} ", +partitionId, cacheKey.realValue(), +partition.getVisibleVersion(), partition.getVisibleVersionTime(), +fromCache, tooNew); +} else { +LOG.info("partition id {}, cacheKey {}, fromCache {}, tooNew {} ", partitionId, +cacheKey.realValue(), fromCache, tooNew); +} +} +} + +public enum KeyType { +DEFAULT, +LONG, +DATE, +DATETIME, +TIME +} + +public static class PartitionKeyType { +private SimpleDateFormat df8 = new SimpleDateFormat("MMdd"); +private SimpleDateFormat
[GitHub] [incubator-doris] morningman merged pull request #4212: Compaction rules optimization
morningman merged pull request #4212: URL: https://github.com/apache/incubator-doris/pull/4212 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: [Compaction]Compaction rules optimization (#4212)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new dc3ed1c [Compaction]Compaction rules optimization (#4212) dc3ed1c is described below commit dc3ed1c525e08f9bd8acfb01b3507c6c7d230164 Author: ZhangYu0123 <67053339+zhangyu0...@users.noreply.github.com> AuthorDate: Wed Aug 19 09:34:13 2020 +0800 [Compaction]Compaction rules optimization (#4212) Compaction rules optimization, the detail problem description and design to see #4164. This pr commits 2 functions: (1) add the cumulative policy configable, and implement original policy. (2) implement universal policy, the optimization version in #4164. --- be/src/common/config.h | 21 + be/src/olap/CMakeLists.txt |1 + be/src/olap/cumulative_compaction.cpp | 63 +- be/src/olap/cumulative_compaction.h|4 +- be/src/olap/cumulative_compaction_policy.cpp | 468 + be/src/olap/cumulative_compaction_policy.h | 263 + be/src/olap/olap_server.cpp| 34 +- be/src/olap/rowset/rowset_meta.h |4 + be/src/olap/storage_engine.h |2 + be/src/olap/tablet.cpp | 116 +-- be/src/olap/tablet.h | 23 +- be/src/olap/version_graph.cpp |3 +- be/test/olap/cumulative_compaction_policy_test.cpp | 1022 docs/en/administrator-guide/config/be_config.md| 42 +- docs/zh-CN/administrator-guide/config/be_config.md | 40 + 15 files changed, 1976 insertions(+), 130 deletions(-) diff --git a/be/src/common/config.h b/be/src/common/config.h index 145d9b3..08151ad 100644 --- a/be/src/common/config.h +++ b/be/src/common/config.h @@ -268,6 +268,27 @@ namespace config { CONF_mInt64(base_compaction_interval_seconds_since_last_operation, "86400"); CONF_mInt32(base_compaction_write_mbytes_per_sec, "5"); +// config the cumulative compaction policy +// Valid configs: num_base, size_based +// num_based policy, the original version of cumulative compaction, cumulative version compaction once. +// size_based policy, a optimization version of cumulative compaction, targeting the use cases requiring +// lower write amplification, trading off read amplification and space amplification. +CONF_String(cumulative_compaction_policy, "num_based"); + +// In size_based policy, output rowset of cumulative compaction total disk size exceed this config size, +// this rowset will be given to base compaction, unit is m byte. +CONF_mInt64(cumulative_size_based_promotion_size_mbytes, "1024"); +// In size_based policy, output rowset of cumulative compaction total disk size exceed this config ratio of +// base rowset's total disk size, this rowset will be given to base compaction. The value must be between +// 0 and 1. +CONF_mDouble(cumulative_size_based_promotion_ratio, "0.05"); +// In size_based policy, the smallest size of rowset promotion. When the rowset is less than this config, this +// rowset will be not given to base compaction. The unit is m byte. +CONF_mInt64(cumulative_size_based_promotion_min_size_mbytes, "64"); +// The lower bound size to do cumulative compaction. When total disk size of candidate rowsets is less than +// this size, size_based policy also does cumulative compaction. The unit is m byte. +CONF_mInt64(cumulative_size_based_compaction_lower_size_mbytes, "64"); + // cumulative compaction policy: max delta file's size unit:B CONF_mInt32(cumulative_compaction_check_interval_seconds, "10"); CONF_mInt64(min_cumulative_compaction_num_singleton_deltas, "5"); diff --git a/be/src/olap/CMakeLists.txt b/be/src/olap/CMakeLists.txt index 884c045..13c11a0 100644 --- a/be/src/olap/CMakeLists.txt +++ b/be/src/olap/CMakeLists.txt @@ -37,6 +37,7 @@ add_library(Olap STATIC comparison_predicate.cpp compress.cpp cumulative_compaction.cpp +cumulative_compaction_policy.cpp delete_handler.cpp delta_writer.cpp file_helper.cpp diff --git a/be/src/olap/cumulative_compaction.cpp b/be/src/olap/cumulative_compaction.cpp index a5f1358..c6bf9f8 100755 --- a/be/src/olap/cumulative_compaction.cpp +++ b/be/src/olap/cumulative_compaction.cpp @@ -27,7 +27,7 @@ CumulativeCompaction::CumulativeCompaction(TabletSharedPtr tablet, const std::st : Compaction(tablet, label, parent_tracker), _cumulative_rowset_size_threshold(config::cumulative_compaction_budgeted_bytes) {} -CumulativeCompaction::~CumulativeCompaction() { } +CumulativeCompaction::~CumulativeCompaction() {} OLAPStatus CumulativeCompaction::compact() { if (!_table
[GitHub] [incubator-doris] HangyuanLiu commented on a change in pull request #4253: Support more materialized view syntax
HangyuanLiu commented on a change in pull request #4253: URL: https://github.com/apache/incubator-doris/pull/4253#discussion_r472590847 ## File path: fe/fe-core/src/main/java/org/apache/doris/analysis/DropMaterializedViewStmt.java ## @@ -38,38 +43,91 @@ */ public class DropMaterializedViewStmt extends DdlStmt { -private String mvName; -private TableName tableName; private boolean ifExists; +private final TableName dbMvName; +private final TableName dbTblName; -public DropMaterializedViewStmt(boolean ifExists, String mvName, TableName tableName) { -this.mvName = mvName; -this.tableName = tableName; +public DropMaterializedViewStmt(boolean ifExists, TableName dbMvName, TableName dbTblName) { this.ifExists = ifExists; +this.dbMvName = dbMvName; +this.dbTblName = dbTblName; +} + +public boolean isSetIfExists() { +return ifExists; } public String getMvName() { -return mvName; +return dbMvName.getTbl(); } -public TableName getTableName() { -return tableName; +public String getTblName() { +if (dbTblName != null) { +return dbTblName.getTbl(); +} else { +return null; +} } -public boolean isIfExists() { -return ifExists; +public String getDbName() { +if (dbTblName != null) { +return dbTblName.getDb(); +} else { +return dbMvName.getDb(); +} } @Override public void analyze(Analyzer analyzer) throws UserException { -if (Strings.isNullOrEmpty(mvName)) { -throw new AnalysisException("The materialized name could not be empty or null."); +if (dbTblName != null && !Strings.isNullOrEmpty(dbMvName.getDb())) { +throw new AnalysisException("Syntax drop materialized view [mv-name] from db.name mush specify database name explicitly in `from`"); Review comment: If db set in from , it mush not be set in view name This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: Support udaf_orthogonal_bitmap (#4198)
This is an automated email from the ASF dual-hosted git repository. lingmiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new f924282 Support udaf_orthogonal_bitmap (#4198) f924282 is described below commit f92428248f91a191294bc2a8a2deb1c209250acc Author: zhbinbin <16679214+zhbin...@users.noreply.github.com> AuthorDate: Wed Aug 19 10:29:13 2020 +0800 Support udaf_orthogonal_bitmap (#4198) The original Doris bitmap aggregation function has poor performance on the intersection and union set of bitmap cardinality of more than one billion. There are two reasons for this. The first is that when the bitmap cardinality is large, if the data size exceeds 1g, the network / disk IO time consumption will increase; The second point is that all the sink data of the back-end be instance are transferred to the top node for intersection and union calculation, which leads to the pressu [...] My solution is to create a fixed schema table based on the Doris fragmentation rule, and hash fragment the ID range based on the bitmap, that is, cut the ID range vertically to form a small cube. Such bitmap blocks will become smaller and evenly distributed on all back-end be instances. Based on the schema table, some new high-performance udaf aggregation functions are developed. All Scan nodes participate in intersection and union calculation, and top nodes only summarize The design goal is that the base number of bitmap is more than 10 billion, and the response time of cross union set calculation of 100 dimensional granularity is within 5 s. There are three udaf functions in this commit: orthogonal_bitmap_intersect_count, orthogonal_bitmap_union_count, orthogonal_bitmap_intersect. --- contrib/udf/CMakeLists.txt |1 + .../udf/src/udaf_orthogonal_bitmap/CMakeLists.txt | 92 ++ .../udf/src/udaf_orthogonal_bitmap/bitmap_value.h | 1326 .../orthogonal_bitmap_function.cpp | 492 .../orthogonal_bitmap_function.h | 62 + .../udf/src/udaf_orthogonal_bitmap/string_value.h | 175 +++ docs/.vuepress/sidebar/en.js |4 +- docs/.vuepress/sidebar/zh-CN.js|4 +- .../udf/contrib/udaf-orthogonal-bitmap-manual.md | 249 .../udf/contrib/udaf-orthogonal-bitmap-manual.md | 238 10 files changed, 2641 insertions(+), 2 deletions(-) diff --git a/contrib/udf/CMakeLists.txt b/contrib/udf/CMakeLists.txt index e0feef1..8554516 100644 --- a/contrib/udf/CMakeLists.txt +++ b/contrib/udf/CMakeLists.txt @@ -72,5 +72,6 @@ set_target_properties(udf PROPERTIES IMPORTED_LOCATION $ENV{DORIS_HOME}/output/u # Add the subdirector of new UDF in here add_subdirectory(${SRC_DIR}/udf_samples) +add_subdirectory(${SRC_DIR}/udaf_orthogonal_bitmap) install(DIRECTORY DESTINATION ${OUTPUT_DIR}) diff --git a/contrib/udf/src/udaf_orthogonal_bitmap/CMakeLists.txt b/contrib/udf/src/udaf_orthogonal_bitmap/CMakeLists.txt new file mode 100644 index 000..5741509 --- /dev/null +++ b/contrib/udf/src/udaf_orthogonal_bitmap/CMakeLists.txt @@ -0,0 +1,92 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# where to put generated libraries +set(LIBRARY_OUTPUT_PATH "${BUILD_DIR}/src/udaf_orthogonal_bitmap") + +# where to put generated binaries +set(EXECUTABLE_OUTPUT_PATH "${BUILD_DIR}/src/udaf_orthogonal_bitmap") + + +# set CMAKE_BUILD_TARGET_ARCH +# use `lscpu | grep 'Architecture' | awk '{print $2}'` only support system which language is en_US.UTF-8 +execute_process(COMMAND bash "-c" "uname -m" +OUTPUT_VARIABLE +CMAKE_BUILD_TARGET_ARCH +OUTPUT_STRIP_TRAILING_WHITESPACE) +message(STATUS "Build target arch is ${CMAKE_BUILD_TARGET_ARCH}") + +# Set dirs +set(SRC_DIR "$ENV{DORIS_HOME}/be/src/") +set(THIRDPARTY_DIR "$ENV{DORIS_THIRDPARTY}/installed/") + +# Set include dirs +include_directories(./) +include_directories(${THIRDPARTY_DIR}/include/) + +# message +message(STATUS "base dir is ${B
[GitHub] [incubator-doris] EmmyMiao87 merged pull request #4198: Add bitmap longitudinal cutting udaf
EmmyMiao87 merged pull request #4198: URL: https://github.com/apache/incubator-doris/pull/4198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] marising commented on issue #4370: Release Nodes 0.13.0
marising commented on issue #4370: URL: https://github.com/apache/incubator-doris/issues/4370#issuecomment-675815496 Please merge the feature: [Feature][Cache] Doris caches query results based on partition #2581 LiHaibo 2020-8-19 At 2020-08-17 19:52:47, "EmmyMiao87" wrote: Credits @ZhangYu0123 @wfjcmcb @Fullstop000 @sduzh @stalary @worker24h @chaoyli @vagetablechicken @jmk1011 @funyeah @wutiangan @gengjun-git @xinghuayu007 @EmmyMiao87 @songenjie @acelyc111 @yangzhg @Seaven @hexian55 @ChenXiaoFei @WingsGo @kangpinghuang @wangbo @weizuo93 @sdgshawn @skyduy @wyb @gaodayue @HappenLee @kangkaisen @wuyunfeng @HangyuanLiu @xy720 @liutang123 @caiconghui @liyuance @spaces-X @hffariel @decster @blackfox1983 @Astralidea @morningman @hf200012 @xbyang18 @Youngwb @imay @marising — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581
marising commented on a change in pull request #4330: URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472609245 ## File path: fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java ## @@ -575,6 +583,78 @@ private void handleSetStmt() { context.getState().setOk(); } +private void sendChannel(MysqlChannel channel, List cacheValues, boolean hitAll) +throws Exception { +RowBatch batch = null; +for (CacheBeProxy.CacheValue value : cacheValues) { +batch = value.getRowBatch(); +for (ByteBuffer row : batch.getBatch().getRows()) { +channel.sendOnePacket(row); +} +context.updateReturnRows(batch.getBatch().getRows().size()); +} +if (hitAll) { +if (batch != null) { +statisticsForAuditLog = batch.getQueryStatistics(); +} +context.getState().setEof(); +return; +} +} + +private boolean handleCacheStmt(CacheAnalyzer cacheAnalyzer,MysqlChannel channel) throws Exception { +RowBatch batch = null; +CacheBeProxy.FetchCacheResult cacheResult = cacheAnalyzer.getCacheData(); +CacheMode mode = cacheAnalyzer.getCacheMode(); +if (cacheResult != null) { +isCached = true; +if (cacheAnalyzer.getHitRange() == Cache.HitRange.Full) { +sendChannel(channel, cacheResult.getValueList(), true); +return true; +} +//rewrite sql +if (mode == CacheMode.Partition) { +if (cacheAnalyzer.getHitRange() == Cache.HitRange.Left) { +sendChannel(channel, cacheResult.getValueList(), false); +} +SelectStmt newSelectStmt = cacheAnalyzer.getRewriteStmt(); +newSelectStmt.reset(); +analyzer = new Analyzer(context.getCatalog(), context); +newSelectStmt.analyze(analyzer); +planner = new Planner(); +planner.plan(newSelectStmt, analyzer, context.getSessionVariable().toThrift()); +} +} + +coord = new Coordinator(context, analyzer, planner); +QeProcessorImpl.INSTANCE.registerQuery(context.queryId(), +new QeProcessorImpl.QueryInfo(context, originStmt.originStmt, coord)); +coord.exec(); + +while (true) { +batch = coord.getNext(); +if (batch.getBatch() != null) { +cacheAnalyzer.copyRowBatch(batch); +for (ByteBuffer row : batch.getBatch().getRows()) { +channel.sendOnePacket(row); +} +context.updateReturnRows(batch.getBatch().getRows().size()); +} +if (batch.isEos()) { +break; +} +} + +if (cacheResult != null && cacheAnalyzer.getHitRange() == Cache.HitRange.Right) { +sendChannel(channel, cacheResult.getValueList(), false); +} + +cacheAnalyzer.updateCache(); Review comment: The updateCache method determines whether the background Cache needs to be updated ``` public void updateCache() { if (cacheMode == CacheMode.None || cacheMode == CacheMode.NoNeed) { return; } cache.updateCache(); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] marising commented on a change in pull request #4330: [Feature][Cache] Sql cache and partition cache #2581
marising commented on a change in pull request #4330: URL: https://github.com/apache/incubator-doris/pull/4330#discussion_r472611199 ## File path: fe/fe-core/src/main/java/org/apache/doris/qe/cache/PartitionRange.java ## @@ -0,0 +1,596 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.qe.cache; + +import org.apache.doris.analysis.CompoundPredicate; +import org.apache.doris.analysis.BinaryPredicate; +import org.apache.doris.analysis.DateLiteral; +import org.apache.doris.analysis.InPredicate; +import org.apache.doris.analysis.PartitionValue; +import org.apache.doris.analysis.Expr; +import org.apache.doris.analysis.LiteralExpr; +import org.apache.doris.analysis.IntLiteral; +import org.apache.doris.catalog.OlapTable; +import org.apache.doris.catalog.PrimitiveType; +import org.apache.doris.catalog.RangePartitionInfo; +import org.apache.doris.catalog.Column; +import org.apache.doris.catalog.Partition; +import org.apache.doris.catalog.PartitionKey; +import org.apache.doris.catalog.Type; +import org.apache.doris.common.Config; +import org.apache.doris.planner.PartitionColumnFilter; + +import org.apache.doris.common.AnalysisException; + +import com.google.common.collect.Lists; +import com.google.common.collect.Range; + +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.List; +import java.util.Map; + +/** + * Convert the range of the partition to the list + * all partition by day/week/month split to day list + */ +public class PartitionRange { +private static final Logger LOG = LogManager.getLogger(PartitionRange.class); + +public class PartitionSingle { +private Partition partition; +private PartitionKey partitionKey; +private long partitionId; +private PartitionKeyType cacheKey; +private boolean fromCache; +private boolean tooNew; + +public Partition getPartition() { +return partition; +} + +public void setPartition(Partition partition) { +this.partition = partition; +} + +public PartitionKey getPartitionKey() { +return partitionKey; +} + +public void setPartitionKey(PartitionKey key) { +this.partitionKey = key; +} + +public long getPartitionId() { +return partitionId; +} + +public void setPartitionId(long partitionId) { +this.partitionId = partitionId; +} + +public PartitionKeyType getCacheKey() { +return cacheKey; +} + +public void setCacheKey(PartitionKeyType cacheKey) { +this.cacheKey.clone(cacheKey); +} + +public boolean isFromCache() { +return fromCache; +} + +public void setFromCache(boolean fromCache) { +this.fromCache = fromCache; +} + +public boolean isTooNew() { +return tooNew; +} + +public void setTooNew(boolean tooNew) { +this.tooNew = tooNew; +} + +public PartitionSingle() { +this.partitionId = 0; +this.cacheKey = new PartitionKeyType(); +this.fromCache = false; +this.tooNew = false; +} + +public void Debug() { +if (partition != null) { +LOG.info("partition id {}, cacheKey {}, version {}, time {}, fromCache {}, tooNew {} ", +partitionId, cacheKey.realValue(), +partition.getVisibleVersion(), partition.getVisibleVersionTime(), +fromCache, tooNew); +} else { +LOG.info("partition id {}, cacheKey {}, fromCache {}, tooNew {} ", partitionId, +cacheKey.realValue(), fromCache, tooNew); +} +} +} + +public enum KeyType { +DEFAULT, +LONG, +DATE, +DATETIME, +TIME +} + +public static class PartitionKeyType { +private SimpleDateFormat df8 = new SimpleDateFormat("MMdd"); +private SimpleDateFormat
[GitHub] [incubator-doris] stalary closed pull request #4378: FIX: fix dynamic partition replicationNum error
stalary closed pull request #4378: URL: https://github.com/apache/incubator-doris/pull/4378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] stalary commented on pull request #4393: FIX: fix dynamic partition replicationNum error
stalary commented on pull request #4393: URL: https://github.com/apache/incubator-doris/pull/4393#issuecomment-675820778 The previous branch did not work properly, so I recreated PR. @morningman This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] stalary opened a new pull request #4393: FIX: fix dynamic partition replicationNum error
stalary opened a new pull request #4393: URL: https://github.com/apache/incubator-doris/pull/4393 ## Proposed changes dynamic_partition.replication_num default is replication_num, but show create table show -1 ## Types of changes What types of changes does your code introduce to Doris? _Put an `x` in the boxes that apply_ - [x] Bugfix (non-breaking change which fixes an issue) - [] New feature (non-breaking change which adds functionality) - [] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [] Documentation Update (if none of the other choices apply) - [] Code refactor (Modify the code structure, format the code, etc...) ## Further comments replace DynamicPartitionProperty toString with getProperties This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #4375: Fix errors when alter materialized view which based on dup table
EmmyMiao87 commented on a change in pull request #4375: URL: https://github.com/apache/incubator-doris/pull/4375#discussion_r472626637 ## File path: fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java ## @@ -556,6 +556,10 @@ private void addColumnInternal(OlapTable olapTable, Column newColumn, ColumnPosi throw new DdlException("Can not assign aggregation method on column in Duplicate data model table: " + newColName); } if (!newColumn.isKey()) { Review comment: I didn't understand what you mean... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee opened a new issue #4394: [Proposal] Support Bucket Shuffle Join for Doris
HappenLee opened a new issue #4394: URL: https://github.com/apache/incubator-doris/issues/4394 ## Motivation At present, Doris support 3 type join: **shuffle join**, **broadcast join**, **colocate join**. Except colocate join,another join will lead to a lot of network consumption. For example, there a SQL A join B, the cost of network. * **broadcast join**: if table A is divided into three parts,the net work cost is ``` 3B``` * **shuffle join**: the network cost is ```A + B```. These network consumption not only leads to slow query, but also leads to extra memory consumption during join. Each Doris table have disrtribute info, if the join expr hit the distribute info, we should use the distribute info to reduce the network consumption. ## What is bucket shuffle join  just like Hive's bucket map join, the picture show how it work. if there a SQL A join B, and the join expr hit the distribute info of A. Bucket shuffle join only need distribute table B, sent the data to proper table A part. So the network cost is always ```B```. So compared with the original join, obviously bucket shuffle join lead to less network cost: B < min(3B, A + B) ### It can bring us the following benefits: 1. First, Bucket Shuffle Join reduce the network cost and lead to a better performance for some join. Especially when the bucket is cropped. 2. It does not strongly rely on the mechanism of collocate, so it is transparent to users. There is no mandatory requirement for data distribution, which will not lead to data skew. 3. It can provide more query optimization space for join reorder. ## POC of Bucket Shuffle Join Now I've implemented a simple Bucket Shuffle join in Doris and test the performance of it. Now, we chose tpcds query 57. The query have 6 join operation, and 4 of them can hit Bucket shuffle join. | | Origin Doris | Bucket shuffle join | | :: | :: | :: | | Time Cost | 27.7s | 16.4s | It seems to work as well as we expected. I'll do more experiments to verify its performance in the future ## Implementation 1. First, we should add a partition type in thrift type 2. FE able to plan and sense queries that can be used bucket shuffle join. send data distribution info to BE 3. BE use the proper hash function to send proper data to proper instance of BE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org