[incubator-doris] 04/20: fix some fe ut failed (#8547)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 1caf8ad3f41d6f1fb3ed15311a9ec38ace6b5333 Author: Zhengguo Yang AuthorDate: Mon Mar 21 10:36:06 2022 +0800 fix some fe ut failed (#8547) --- .../java/org/apache/doris/external/elasticsearch/EsShardPartitions.java | 2 +- .../src/test/java/org/apache/doris/catalog/TempPartitionTest.java | 2 +- .../src/test/java/org/apache/doris/catalog/TruncateTableTest.java | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsShardPartitions.java b/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsShardPartitions.java index 710e6832..967cf18 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsShardPartitions.java +++ b/fe/fe-core/src/main/java/org/apache/doris/external/elasticsearch/EsShardPartitions.java @@ -77,7 +77,7 @@ public class EsShardPartitions { singleShardRouting.add( EsShardRouting.newSearchShard( (String) indexShard.get("index"), -(Integer) indexShard.get("shard"), +((Long) indexShard.get("shard")).intValue(), (Boolean) indexShard.get("primary"), (String) indexShard.get("node"), (JSONObject) jsonObject.get("nodes"))); diff --git a/fe/fe-core/src/test/java/org/apache/doris/catalog/TempPartitionTest.java b/fe/fe-core/src/test/java/org/apache/doris/catalog/TempPartitionTest.java index b587730..ffd3528 100644 --- a/fe/fe-core/src/test/java/org/apache/doris/catalog/TempPartitionTest.java +++ b/fe/fe-core/src/test/java/org/apache/doris/catalog/TempPartitionTest.java @@ -116,7 +116,7 @@ public class TempPartitionTest { private List> checkTablet(String tbl, String partitions, boolean isTemp, int expected) throws Exception { -String showStr = "show tablet from " + tbl + (isTemp ? " temporary" : "") + " partition (" + partitions + ");"; +String showStr = "show tablets from " + tbl + (isTemp ? " temporary" : "") + " partition (" + partitions + ");"; ShowTabletStmt showStmt = (ShowTabletStmt) UtFrameUtils.parseAndAnalyzeStmt(showStr, ctx); ShowExecutor executor = new ShowExecutor(ctx, (ShowStmt) showStmt); ShowResultSet showResultSet = executor.execute(); diff --git a/fe/fe-core/src/test/java/org/apache/doris/catalog/TruncateTableTest.java b/fe/fe-core/src/test/java/org/apache/doris/catalog/TruncateTableTest.java index 15a8f82..db8951d 100644 --- a/fe/fe-core/src/test/java/org/apache/doris/catalog/TruncateTableTest.java +++ b/fe/fe-core/src/test/java/org/apache/doris/catalog/TruncateTableTest.java @@ -125,7 +125,7 @@ public class TruncateTableTest { } private List> checkShowTabletResultNum(String tbl, String partition, int expected) throws Exception { -String showStr = "show tablet from " + tbl + " partition(" + partition + ")"; +String showStr = "show tablets from " + tbl + " partition(" + partition + ")"; ShowTabletStmt showStmt = (ShowTabletStmt) UtFrameUtils.parseAndAnalyzeStmt(showStr, connectContext); ShowExecutor executor = new ShowExecutor(connectContext, (ShowStmt) showStmt); ShowResultSet showResultSet = executor.execute(); - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch dev-1.0.1 updated (dcf66f2 -> e660dd3)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a change to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git. from dcf66f2 [fix][vectorized] Fix bug of left semi/anti with other join conjunct (#8596) new 2ee49c6 [Enhancement](load) speed up stream load for duplicate table, use template for faster get_type_info. (#8500) new d6be70c [fix] Fix coredump of stddev function (#8543) new 2e1190b [improvement] Improve sig handler (#8545) new 1caf8ad fix some fe ut failed (#8547) new 11374d6 [api-change] add soft limit of String type length (#8567) new 9861c3c [fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581) new a068447 [fix](load) fix another bug that BE may crash when calling `mark_as_failed` (#8607) new 84331c8 [doc] fix typo for session (#8610) new 07cc837 [doc] fix help module failed (#8617) new ca1974f [fix](load) Fix null column bug in load's mapping column setting (#8625) new d1cf997 [fix](vectorization) Vectorization decimal arithmetic inconsistent (#8626) new 2e46b37 ow num is more accurate than column num in data_types (#8628) new 035006f [chore] optimize aws thirdparty package download. (#8637) new d532aad [fix](vec) fix coredump for aggregate function when delete large_data, due to alloc-dealloc-mismatch (#8641) new 6e6ba61 [chore] add -rtlib=compiler-rt for UBSAN under clang (#8647) new 2e1e2b3 [fix](mini-load) Remove mini load in LOADING and PENDING state (#8649) new c492f41 [chore] Optimize build_lz4 in build-thirdparty.sh (#8653) new 0d9d786 [doc] update doc of vec-execution-engine (#8655) new 535e574 [Refactor] Remove ununsed file (#8657) new e660dd3 [fix] fix core dump when avg on not null decimal in empty table (#8681) The 20 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .licenserc.yaml| 1 + .rat-excludes | 1 + LICENSE.txt| 33 ++ be/CMakeLists.txt | 39 +- be/src/common/config.h | 7 + be/src/common/daemon.cpp | 2 +- be/src/common/logconfig.cpp| 6 +- be/src/common/signal_handler.h | 446 + be/src/exec/pl_task_root.cpp | 140 --- be/src/exec/pl_task_root.h | 47 --- be/src/exec/tablet_sink.cpp| 48 ++- be/src/olap/aggregate_func.h | 3 +- be/src/olap/bloom_filter_predicate.h | 2 +- be/src/olap/delete_handler.cpp | 4 +- be/src/olap/memtable.cpp | 5 +- be/src/olap/olap_common.h | 9 +- be/src/olap/olap_define.h | 3 - be/src/olap/row_block2.cpp | 8 +- be/src/olap/rowset/segment_v2/column_reader.cpp| 2 +- be/src/olap/rowset/segment_v2/segment_iterator.cpp | 83 ++-- be/src/olap/rowset/segment_v2/segment_iterator.h | 29 +- be/src/olap/types.cpp | 25 +- be/src/olap/types.h| 33 +- be/src/olap/wrapper_field.cpp | 5 +- be/src/runtime/primitive_type.h| 22 + be/src/runtime/types.h | 3 +- be/src/service/doris_main.cpp | 2 + be/src/util/logging.h | 2 +- .../aggregate_functions/aggregate_function_avg.h | 6 +- .../aggregate_function_min_max.h | 8 +- .../aggregate_function_stddev.cpp | 7 +- .../aggregate_function_stddev.h| 34 +- be/src/vec/columns/column_nullable.h | 15 +- be/src/vec/columns/column_string.h | 22 +- be/src/vec/columns/predicate_column.h | 6 +- be/src/vec/core/block.cpp | 24 ++ be/src/vec/core/block.h| 2 + be/src/vec/core/types.h| 24 +- be/src/vec/data_types/data_type_bitmap.cpp | 30 +- be/src/vec/data_types/data_type_decimal.cpp| 24 +- be/src/vec/data_types/data_type_hll.cpp| 30 +- be/src/vec/data_types/data_type_nullable.cpp | 14 +- be/src/vec/data_types/data_type_number_base.cpp| 22 +- be/src/vec/data_types/data_type_string.cpp | 14 +- be/src/vec/functions/divide.cpp| 8 + be/src/vec/functions/function_binary_arithmetic.h
[incubator-doris] 01/20: [Enhancement](load) speed up stream load for duplicate table, use template for faster get_type_info. (#8500)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 2ee49c693f19524bbf44351cada5cdcaaae33004 Author: zbtzbtzbt <35688959+zbtzbt...@users.noreply.github.com> AuthorDate: Fri Mar 25 15:18:43 2022 +0800 [Enhancement](load) speed up stream load for duplicate table, use template for faster get_type_info. (#8500) --- be/src/olap/aggregate_func.h| 3 ++- be/src/olap/memtable.cpp| 5 ++--- be/src/olap/olap_common.h | 9 be/src/olap/rowset/segment_v2/column_reader.cpp | 2 +- be/src/olap/types.cpp | 25 +++-- be/src/olap/types.h | 29 ++--- 6 files changed, 38 insertions(+), 35 deletions(-) diff --git a/be/src/olap/aggregate_func.h b/be/src/olap/aggregate_func.h index 5173efd..9b2ba52 100644 --- a/be/src/olap/aggregate_func.h +++ b/be/src/olap/aggregate_func.h @@ -102,7 +102,8 @@ struct BaseAggregateFuncs { auto _type_info = get_collection_type_info(sub_type); _type_info->deep_copy(dst->mutable_cell_ptr(), src, mem_pool); } else { -auto _type_info = get_type_info(field_type); +// get type at compile time for performance +auto _type_info = get_scalar_type_info(); _type_info->deep_copy(dst->mutable_cell_ptr(), src, mem_pool); } } diff --git a/be/src/olap/memtable.cpp b/be/src/olap/memtable.cpp index 835842a..460638d 100644 --- a/be/src/olap/memtable.cpp +++ b/be/src/olap/memtable.cpp @@ -109,9 +109,8 @@ void MemTable::_tuple_to_row(const Tuple* tuple, ContiguousRow* row, MemPool* me const SlotDescriptor* slot = (*_slot_descs)[i]; bool is_null = tuple->is_null(slot->null_indicator_offset()); -const void* value = tuple->get_slot(slot->tuple_offset()); -_schema->column(i)->consume(&cell, (const char*)value, is_null, mem_pool, -&_agg_buffer_pool); +const auto* value = (const char*)tuple->get_slot(slot->tuple_offset()); +_schema->column(i)->consume(&cell, value, is_null, mem_pool, &_agg_buffer_pool); } } diff --git a/be/src/olap/olap_common.h b/be/src/olap/olap_common.h index 470efee..6fa64e6 100644 --- a/be/src/olap/olap_common.h +++ b/be/src/olap/olap_common.h @@ -55,10 +55,10 @@ struct DataDirInfo { FilePathDesc path_desc; size_t path_hash = 0; int64_t disk_capacity = 1; // actual disk capacity -int64_t available = 0; // 可用空间,单位字节 +int64_t available = 0; // available space, in bytes unit int64_t data_used_capacity = 0; -bool is_used = false; // 是否可用标识 -TStorageMedium::type storage_medium = TStorageMedium::HDD; // 存储介质类型:SSD|HDD +bool is_used = false; // whether available mark +TStorageMedium::type storage_medium = TStorageMedium::HDD; // Storage medium type: SSD|HDD }; // Sort DataDirInfo by available space. @@ -114,8 +114,7 @@ enum DelCondSatisfied { DEL_NOT_SATISFIED = 1, //not satisfy delete condition DEL_PARTIAL_SATISFIED = 2, //partially satisfy delete condition }; - -// 定义Field支持的所有数据类型 +// Define all data types supported by Field. enum FieldType { OLAP_FIELD_TYPE_TINYINT = 1, // MYSQL_TYPE_TINY OLAP_FIELD_TYPE_UNSIGNED_TINYINT = 2, diff --git a/be/src/olap/rowset/segment_v2/column_reader.cpp b/be/src/olap/rowset/segment_v2/column_reader.cpp index 20d2918..74ee07e 100644 --- a/be/src/olap/rowset/segment_v2/column_reader.cpp +++ b/be/src/olap/rowset/segment_v2/column_reader.cpp @@ -388,7 +388,7 @@ Status ArrayFileColumnIterator::init(const ColumnIteratorOptions& opts) { if (_array_reader->is_nullable()) { RETURN_IF_ERROR(_null_iterator->init(opts)); } -auto offset_type_info = get_scalar_type_info(FieldType::OLAP_FIELD_TYPE_UNSIGNED_INT); +auto offset_type_info = get_scalar_type_info(OLAP_FIELD_TYPE_UNSIGNED_INT); RETURN_IF_ERROR( ColumnVectorBatch::create(1024, false, offset_type_info, nullptr, &_length_batch)); return Status::OK(); diff --git a/be/src/olap/types.cpp b/be/src/olap/types.cpp index 920c938..62ebc08 100644 --- a/be/src/olap/types.cpp +++ b/be/src/olap/types.cpp @@ -22,24 +22,6 @@ namespace doris { void (*FieldTypeTraits::set_to_max)(void*) = nullptr; -template -ScalarTypeInfo::ScalarTypeInfo(TypeTraitsClass t) -: _equal(TypeTraitsClass::equal), - _cmp(TypeTraitsClass::cmp), - _shallow_copy(TypeTraitsClass::shallow_copy), - _deep_copy(TypeTraitsClass::deep_copy), - _copy_object(TypeTraitsClass::copy_object), - _direct_copy(TypeTraitsClass::direct_copy), - _direct_copy_may_cut(TypeTraitsClass::direct_copy_may_cut), -
[incubator-doris] 02/20: [fix] Fix coredump of stddev function (#8543)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit d6be70c13043e42e8f3581cfbe5291b504afc376 Author: HappenLee AuthorDate: Thu Mar 24 11:39:29 2022 +0800 [fix] Fix coredump of stddev function (#8543) This is only a temporary fix its performance is not ideal. Finally, we need to reconstruct the functions of `stddev` and delete the interface of `insert_to_null_default ()`. --- .../aggregate_function_stddev.cpp | 7 +++-- .../aggregate_function_stddev.h| 32 -- .../apache/doris/catalog/AggregateFunction.java| 6 +++- .../java/org/apache/doris/catalog/FunctionSet.java | 1 + 4 files changed, 34 insertions(+), 12 deletions(-) diff --git a/be/src/vec/aggregate_functions/aggregate_function_stddev.cpp b/be/src/vec/aggregate_functions/aggregate_function_stddev.cpp index f1794d6..2b06423 100644 --- a/be/src/vec/aggregate_functions/aggregate_function_stddev.cpp +++ b/be/src/vec/aggregate_functions/aggregate_function_stddev.cpp @@ -90,11 +90,14 @@ AggregateFunctionPtr create_aggregate_function_stddev_pop(const std::string& nam void register_aggregate_function_stddev_variance(AggregateFunctionSimpleFactory& factory) { factory.register_function("variance_samp", create_aggregate_function_variance_samp); -factory.register_function("variance", create_aggregate_function_variance_pop); +factory.register_function("variance_samp", create_aggregate_function_variance_samp, true); +factory.register_function("stddev_samp", create_aggregate_function_stddev_samp); +factory.register_function("stddev_samp", create_aggregate_function_stddev_samp, true); factory.register_alias("variance_samp", "var_samp"); + +factory.register_function("variance", create_aggregate_function_variance_pop); factory.register_alias("variance", "var_pop"); factory.register_alias("variance", "variance_pop"); -factory.register_function("stddev_samp", create_aggregate_function_stddev_samp); factory.register_function("stddev", create_aggregate_function_stddev_pop); factory.register_alias("stddev", "stddev_pop"); } diff --git a/be/src/vec/aggregate_functions/aggregate_function_stddev.h b/be/src/vec/aggregate_functions/aggregate_function_stddev.h index 82e8718..50c4064 100644 --- a/be/src/vec/aggregate_functions/aggregate_function_stddev.h +++ b/be/src/vec/aggregate_functions/aggregate_function_stddev.h @@ -69,7 +69,7 @@ struct BaseData { } static const DataTypePtr get_return_type() { -return make_nullable(std::make_shared>()); +return std::make_shared>(); } void merge(const BaseData& rhs) { @@ -83,7 +83,7 @@ struct BaseData { count = sum_count; } -void add(const IColumn** columns, size_t row_num) { +virtual void add(const IColumn** columns, size_t row_num) { const auto& sources = static_cast&>(*columns[0]); double source_data = sources.get_data()[row_num]; @@ -145,7 +145,7 @@ struct BaseDatadecimal { } static const DataTypePtr get_return_type() { -return make_nullable(std::make_shared>(27, 9)); +return std::make_shared>(27, 9); } void merge(const BaseDatadecimal& rhs) { @@ -164,7 +164,7 @@ struct BaseDatadecimal { count += rhs.count; } -void add(const IColumn** columns, size_t row_num) { +virtual void add(const IColumn** columns, size_t row_num) { DecimalV2Value source_data = DecimalV2Value(); const auto& sources = static_cast&>(*columns[0]); source_data = (DecimalV2Value)sources.get_data()[row_num]; @@ -191,14 +191,12 @@ struct PopData : Data { using ColVecResult = std::conditional_t, ColumnDecimal, ColumnVector>; void insert_result_into(IColumn& to) const { -ColumnNullable& nullable_column = assert_cast(to); -auto& col = static_cast(nullable_column.get_nested_column()); +auto& col = assert_cast(to); if constexpr (IsDecimalNumber) { col.get_data().push_back(this->get_pop_result().value()); } else { col.get_data().push_back(this->get_pop_result()); } -nullable_column.get_null_map_data().push_back(0); } }; @@ -220,6 +218,24 @@ struct SampData : Data { nullable_column.get_null_map_data().push_back(0); } } + +static const DataTypePtr get_return_type() { +return make_nullable(Data::get_return_type()); +} + +void add(const IColumn** columns, size_t row_num) override { +if (columns[0]->is_nullable()) { +const auto& nullable_column = assert_cast(*columns[0]); +if (!nullable_column.is_null_at(row_num)) { +const IColumn* new_columns[1]; +new_columns[0]
[incubator-doris] 03/20: [improvement] Improve sig handler (#8545)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 2e1190b42ed7ce56cfc78017efe2db0e6887863b Author: yiguolei AuthorDate: Tue Mar 22 10:40:31 2022 +0800 [improvement] Improve sig handler (#8545) * Refactor glog's default signal handler Co-authored-by: Zhengguo Yang <780531...@qq.com> --- .licenserc.yaml| 1 + .rat-excludes | 1 + LICENSE.txt| 33 +++ be/CMakeLists.txt | 4 + be/src/common/daemon.cpp | 2 +- be/src/common/logconfig.cpp| 6 +- be/src/common/signal_handler.h | 446 + be/src/service/doris_main.cpp | 2 + be/src/util/logging.h | 2 +- build.sh | 4 +- thirdparty/CHANGELOG.md| 3 + thirdparty/build-thirdparty.sh | 15 ++ thirdparty/vars.sh | 9 +- 13 files changed, 519 insertions(+), 9 deletions(-) diff --git a/.licenserc.yaml b/.licenserc.yaml index f5bc904..40c83fd 100644 --- a/.licenserc.yaml +++ b/.licenserc.yaml @@ -34,6 +34,7 @@ header: - 'tsan_suppressions' - 'docs/.markdownlintignore' - 'fe/fe-core/src/test/resources/data/net_snmp_normal' +- 'be/src/common/signal_handler.h' - 'be/src/olap/lru_cache.cpp' - 'be/src/olap/lru_cache.h' - 'be/src/olap/skiplist.h' diff --git a/.rat-excludes b/.rat-excludes index 34ae84a..34893e3 100644 --- a/.rat-excludes +++ b/.rat-excludes @@ -26,6 +26,7 @@ jmockit/* status.* env* lru* +signal_handler.h skiplist.h string_search.hpp coding.* diff --git a/LICENSE.txt b/LICENSE.txt index 81bc3fa..775386b 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -576,3 +576,36 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + + + +be/src/common/signal_handler.h: + +Copyright (c) 2008, Google Inc. +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. +* Redistributions in binary form must reproduce the above +copyright notice, this list of conditions and the following disclaimer +in the documentation and/or other materials provided with the +distribution. +* Neither the name of Google Inc. nor the names of its +contributors may be used to endorse or promote products derived from +this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. \ No newline at end of file diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt index 7c83019..cd55f8f 100644 --- a/be/CMakeLists.txt +++ b/be/CMakeLists.txt @@ -143,6 +143,9 @@ set_target_properties(gflags PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/ add_library(glog STATIC IMPORTED) set_target_properties(glog PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libglog.a) +add_library(backtrace STATIC IMPORTED) +set_target_properties(backtrace PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libbacktrace.a) + add_library(re2 STATIC IMPORTED) set_target_properties(re2 PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libre2.a) @@ -529,6 +532,7 @@ endif() # When adding new dependencies, If you don’t know if it can run on all platforms, # add it here first. set(COMMON_THIRDPARTY +backtrace rocksdb cyrus-sasl libs2 diff --git a/be/src/common/daemon.cpp b/be/src/common/daemon.cpp index 044feda..07b5439 100644 --- a/be/src/common/daemon.cpp +++ b/be/src/common/daemon.cpp @@ -235,7 +235,7 @@ void Daemon::init(int argc, char** argv, const std::vector& paths) { // google::SetVersionString(get_build_version(false)); // google::ParseCommandLineFlags(&argc, &argv, true); google::ParseCommandLineFlags(&argc, &argv, true); -ini
[incubator-doris] 12/20: ow num is more accurate than column num in data_types (#8628)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 2e46b373c203c945a4e4f27f412e7fa6dcc652ce Author: dataroaring <98214048+dataroar...@users.noreply.github.com> AuthorDate: Fri Mar 25 14:38:27 2022 +0800 ow num is more accurate than column num in data_types (#8628) --- .../aggregate_function_stddev.h| 2 ++ be/src/vec/data_types/data_type_bitmap.cpp | 30 +++--- be/src/vec/data_types/data_type_decimal.cpp| 24 - be/src/vec/data_types/data_type_hll.cpp| 30 +++--- be/src/vec/data_types/data_type_nullable.cpp | 14 +- be/src/vec/data_types/data_type_number_base.cpp| 22 be/src/vec/data_types/data_type_string.cpp | 14 +- 7 files changed, 69 insertions(+), 67 deletions(-) diff --git a/be/src/vec/aggregate_functions/aggregate_function_stddev.h b/be/src/vec/aggregate_functions/aggregate_function_stddev.h index 50c4064..83c4041 100644 --- a/be/src/vec/aggregate_functions/aggregate_function_stddev.h +++ b/be/src/vec/aggregate_functions/aggregate_function_stddev.h @@ -28,6 +28,7 @@ namespace doris::vectorized { template struct BaseData { BaseData() : mean(0.0), m2(0.0), count(0) {} +virtual ~BaseData() {} void write(BufferWritable& buf) const { write_binary(mean, buf); @@ -102,6 +103,7 @@ struct BaseData { template struct BaseDatadecimal { BaseDatadecimal() : mean(0), m2(0), count(0) {} +virtual ~BaseDatadecimal() {} void write(BufferWritable& buf) const { write_binary(mean, buf); diff --git a/be/src/vec/data_types/data_type_bitmap.cpp b/be/src/vec/data_types/data_type_bitmap.cpp index 88a0837..97f34ac 100644 --- a/be/src/vec/data_types/data_type_bitmap.cpp +++ b/be/src/vec/data_types/data_type_bitmap.cpp @@ -24,7 +24,7 @@ namespace doris::vectorized { // binary: | -// : column num | bitmap1 size | bitmap2 size | ... +// : row num | bitmap1 size | bitmap2 size | ... // : bitmap1 | bitmap2 | ... int64_t DataTypeBitMap::get_uncompressed_serialized_bytes(const IColumn& column) const { auto ptr = column.convert_to_full_column_if_const(); @@ -44,19 +44,19 @@ char* DataTypeBitMap::serialize(const IColumn& column, char* buf) const { auto ptr = column.convert_to_full_column_if_const(); auto& data_column = assert_cast(*ptr); -// serialize the bitmap size array, column num saves at index 0 -const auto column_num = column.size(); -size_t bitmap_size_array[column_num + 1]; -bitmap_size_array[0] = column_num; -for (size_t i = 0; i < column.size(); ++i) { +// serialize the bitmap size array, row num saves at index 0 +const auto row_num = column.size(); +size_t bitmap_size_array[row_num + 1]; +bitmap_size_array[0] = row_num; +for (size_t i = 0; i < row_num; ++i) { auto& bitmap = const_cast(data_column.get_element(i)); bitmap_size_array[i + 1] = bitmap.getSizeInBytes(); } -auto allocate_len_size = sizeof(size_t) * (column_num + 1); +auto allocate_len_size = sizeof(size_t) * (row_num + 1); memcpy(buf, bitmap_size_array, allocate_len_size); buf += allocate_len_size; // serialize each bitmap -for (size_t i = 0; i < column_num; ++i) { +for (size_t i = 0; i < row_num; ++i) { auto& bitmap = const_cast(data_column.get_element(i)); bitmap.write(buf); buf += bitmap_size_array[i + 1]; @@ -70,18 +70,18 @@ const char* DataTypeBitMap::deserialize(const char* buf, IColumn* column) const auto& data = data_column.get_data(); // deserialize the bitmap size array -size_t column_num = *reinterpret_cast(buf); +size_t row_num = *reinterpret_cast(buf); buf += sizeof(size_t); -size_t bitmap_size_array[column_num]; -memcpy(bitmap_size_array, buf, sizeof(size_t) * column_num); -buf += sizeof(size_t) * column_num; +size_t bitmap_size_array[row_num]; +memcpy(bitmap_size_array, buf, sizeof(size_t) * row_num); +buf += sizeof(size_t) * row_num; // deserialize each bitmap -data.resize(column_num); -for (int i = 0; i < column_num ; ++i) { +data.resize(row_num); +for (int i = 0; i < row_num ; ++i) { data[i].deserialize(buf); buf += bitmap_size_array[i]; } - + return buf; } diff --git a/be/src/vec/data_types/data_type_decimal.cpp b/be/src/vec/data_types/data_type_decimal.cpp index dee7955..8cd97a3 100644 --- a/be/src/vec/data_types/data_type_decimal.cpp +++ b/be/src/vec/data_types/data_type_decimal.cpp @@ -62,7 +62,7 @@ void DataTypeDecimal::to_string(const IColumn& column, size_t row_num, ostr.write(str.data(), str.size()); } -// binary: column_num | value1 | value2 | ... +// binary: row_num | value1 | value2 | ... template int64_t DataTypeDecimal::
[incubator-doris] 11/20: [fix](vectorization) Vectorization decimal arithmetic inconsistent (#8626)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit d1cf9978702ab60871293bce923a4a7dbe6d9fb9 Author: wangbo AuthorDate: Mon Mar 28 10:12:39 2022 +0800 [fix](vectorization) Vectorization decimal arithmetic inconsistent (#8626) --- be/src/vec/functions/divide.cpp | 8 be/src/vec/functions/function_binary_arithmetic.h | 6 -- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/be/src/vec/functions/divide.cpp b/be/src/vec/functions/divide.cpp index f08f947..71de120 100644 --- a/be/src/vec/functions/divide.cpp +++ b/be/src/vec/functions/divide.cpp @@ -24,11 +24,19 @@ namespace doris::vectorized { +static const DecimalV2Value one(1, 0); + template struct DivideFloatingImpl { using ResultType = typename NumberTraits::ResultOfFloatingPointDivision::Type; static const constexpr bool allow_decimal = true; +template +static inline DecimalV2Value apply(DecimalV2Value a, DecimalV2Value b, NullMap& null_map, size_t index) { +null_map[index] = b.is_zero(); +return a / (b.is_zero() ? one : b); +} + template static inline Result apply(A a, B b, NullMap& null_map, size_t index) { null_map[index] = b == 0; diff --git a/be/src/vec/functions/function_binary_arithmetic.h b/be/src/vec/functions/function_binary_arithmetic.h index da1cabb..dd36158 100644 --- a/be/src/vec/functions/function_binary_arithmetic.h +++ b/be/src/vec/functions/function_binary_arithmetic.h @@ -205,12 +205,6 @@ struct DecimalBinaryOperation { ResultType scale_a [[maybe_unused]], ResultType scale_b [[maybe_unused]], NullMap& null_map) { size_t size = a.size(); -if constexpr (is_division && IsDecimalNumber) { -for (size_t i = 0; i < size; ++i) { -c[i] = apply_scaled_div(a[i], b[i], scale_a, null_map, i); -} -return; -} /// default: use it if no return before for (size_t i = 0; i < size; ++i) { - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 06/20: [fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 9861c3c028165fe913bf1f804e8a538b6ffa3800 Author: Pxl <952130...@qq.com> AuthorDate: Thu Mar 24 09:12:42 2022 +0800 [fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581) --- be/src/olap/bloom_filter_predicate.h | 2 +- be/src/olap/rowset/segment_v2/segment_iterator.cpp | 83 +++--- be/src/olap/rowset/segment_v2/segment_iterator.h | 29 ++-- be/src/runtime/primitive_type.h| 22 ++ be/src/vec/columns/column_nullable.h | 15 ++-- be/src/vec/columns/column_string.h | 22 -- be/src/vec/columns/predicate_column.h | 6 +- be/src/vec/core/block.cpp | 24 +++ be/src/vec/core/block.h| 2 + be/src/vec/core/types.h| 24 ++- 10 files changed, 166 insertions(+), 63 deletions(-) diff --git a/be/src/olap/bloom_filter_predicate.h b/be/src/olap/bloom_filter_predicate.h index c86e991..3b49cb0 100644 --- a/be/src/olap/bloom_filter_predicate.h +++ b/be/src/olap/bloom_filter_predicate.h @@ -111,7 +111,7 @@ template void BloomFilterColumnPredicate::evaluate(vectorized::IColumn& column, uint16_t* sel, uint16_t* size) const { uint16_t new_size = 0; -using T = typename PrimitiveTypeTraits::CppType; +using T = typename PredicatePrimitiveTypeTraits::PredicateFieldType; if (column.is_nullable()) { auto* nullable_col = vectorized::check_and_get_column(column); diff --git a/be/src/olap/rowset/segment_v2/segment_iterator.cpp b/be/src/olap/rowset/segment_v2/segment_iterator.cpp index 1ea0193..da1c219 100644 --- a/be/src/olap/rowset/segment_v2/segment_iterator.cpp +++ b/be/src/olap/rowset/segment_v2/segment_iterator.cpp @@ -25,6 +25,7 @@ #include "olap/column_predicate.h" #include "olap/fs/fs_util.h" #include "olap/in_list_predicate.h" +#include "olap/olap_common.h" #include "olap/row.h" #include "olap/row_block2.h" #include "olap/row_cursor.h" @@ -614,9 +615,9 @@ void SegmentIterator::_vec_init_lazy_materialization() { _is_pred_column[cid] = true; pred_column_ids.insert(cid); -if (type == OLAP_FIELD_TYPE_VARCHAR || type == OLAP_FIELD_TYPE_CHAR -|| type == OLAP_FIELD_TYPE_STRING || predicate->is_in_predicate() -|| predicate->is_bloom_filter_predicate()) { +if (type == OLAP_FIELD_TYPE_VARCHAR || type == OLAP_FIELD_TYPE_CHAR || +type == OLAP_FIELD_TYPE_STRING || predicate->is_in_predicate() || +predicate->is_bloom_filter_predicate()) { short_cir_pred_col_id_set.insert(cid); _short_cir_eval_predicate.push_back(predicate); _is_all_column_basic_type = false; @@ -640,7 +641,7 @@ void SegmentIterator::_vec_init_lazy_materialization() { _is_pred_column[cid] = true; } } - + if (_schema.column_ids().size() > pred_column_ids.size()) { for (auto cid : _schema.column_ids()) { if (!_is_pred_column[cid]) { @@ -716,6 +717,8 @@ Status SegmentIterator::_read_columns(const std::vector& column_ids, void SegmentIterator::_init_current_block( vectorized::Block* block, std::vector& current_columns) { +_char_type_idx.clear(); + bool is_block_mem_reuse = block->mem_reuse(); if (is_block_mem_reuse) { block->clear_column_data(_schema.num_column_ids()); @@ -736,10 +739,15 @@ void SegmentIterator::_init_current_block( for (size_t i = 0; i < _schema.num_column_ids(); i++) { auto cid = _schema.column_id(i); +auto column_desc = _schema.column(cid); + +if (column_desc->type() == OLAP_FIELD_TYPE_CHAR) { +_char_type_idx.emplace_back(i); +} + if (_is_pred_column[cid]) { //todo(wb) maybe we can relase it after output block current_columns[cid]->clear(); } else { // non-predicate column -auto column_desc = _schema.column(cid); if (is_block_mem_reuse) { current_columns[cid] = std::move(*block->get_by_position(i).column).mutate(); } else { @@ -768,19 +776,6 @@ void SegmentIterator::_output_non_pred_columns(vectorized::Block* block, bool is } } -Status SegmentIterator::_output_column_by_sel_idx(vectorized::Block* block, - const std::vector& columnIds, - uint16_t* sel_rowid_idx, uint16_t select_size, - bool is_block_mem_reuse) { -for (auto cid : columnIds) { -
[incubator-doris] 05/20: [api-change] add soft limit of String type length (#8567)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 11374d67b2195d24778657b2b29cbbe1b2b2c395 Author: Zhengguo Yang AuthorDate: Fri Mar 25 09:28:41 2022 +0800 [api-change] add soft limit of String type length (#8567) 1. add a config string_type_soft_limit to soft limit max length of string type 2. disable using String type in Key column, partition column and distribution column 3. remove String type alias BLOB for futrue use --- be/src/common/config.h | 7 be/src/exec/tablet_sink.cpp| 43 ++ be/src/olap/delete_handler.cpp | 4 +- be/src/olap/olap_define.h | 3 -- be/src/olap/row_block2.cpp | 8 ++-- be/src/olap/types.h| 4 +- be/src/olap/wrapper_field.cpp | 5 ++- be/src/runtime/types.h | 3 +- be/src/vec/sink/vtablet_sink.cpp | 29 +++ docs/en/administrator-guide/config/be_config.md| 5 +++ .../sql-statements/Data Types/STRING.md| 2 +- docs/zh-CN/administrator-guide/config/be_config.md | 5 +++ .../sql-statements/Data Types/STRING.md| 2 +- fe/fe-core/src/main/cup/sql_parser.cup | 2 - .../java/org/apache/doris/analysis/ColumnDef.java | 5 ++- .../org/apache/doris/analysis/CreateTableStmt.java | 12 -- .../apache/doris/analysis/DistributionDesc.java| 2 +- .../doris/analysis/HashDistributionDesc.java | 18 - .../org/apache/doris/analysis/PartitionDesc.java | 5 +++ .../doris/analysis/RandomDistributionDesc.java | 2 +- .../java/org/apache/doris/catalog/ScalarType.java | 1 - .../java/org/apache/doris/planner/PlannerTest.java | 15 22 files changed, 123 insertions(+), 59 deletions(-) diff --git a/be/src/common/config.h b/be/src/common/config.h index 5cb5a17..9c255c2 100644 --- a/be/src/common/config.h +++ b/be/src/common/config.h @@ -704,6 +704,13 @@ CONF_String(function_service_protocol, "h2:grpc"); // use which load balancer to select server to connect CONF_String(rpc_load_balancer, "rr"); +// a soft limit of string type length, the hard limit is 2GB - 4, but if too long will cause very low performance, +// so we set a soft limit, default is 1MB +CONF_mInt32(string_type_length_soft_limit_bytes, "1048576"); + +CONF_Validator(string_type_length_soft_limit_bytes, + [](const int config) -> bool { return config > 0 && config <= 2147483643; }); + } // namespace config } // namespace doris diff --git a/be/src/exec/tablet_sink.cpp b/be/src/exec/tablet_sink.cpp index 0a56369..2dcd9d4 100644 --- a/be/src/exec/tablet_sink.cpp +++ b/be/src/exec/tablet_sink.cpp @@ -191,7 +191,8 @@ Status NodeChannel::open_wait() { return; } // If rpc failed, mark all tablets on this node channel as failed -_index_channel->mark_as_failed(this->node_id(), this->host(), _add_batch_closure->cntl.ErrorText(), -1); +_index_channel->mark_as_failed(this->node_id(), this->host(), + _add_batch_closure->cntl.ErrorText(), -1); Status st = _index_channel->check_intolerable_failure(); if (!st.ok()) { _cancel_with_msg(fmt::format("{}, err: {}", channel_info(), st.get_error_msg())); @@ -214,7 +215,8 @@ Status NodeChannel::open_wait() { if (status.ok()) { // if has error tablet, handle them first for (auto& error : result.tablet_errors()) { -_index_channel->mark_as_failed(this->node_id(), this->host(), error.msg(), error.tablet_id()); +_index_channel->mark_as_failed(this->node_id(), this->host(), error.msg(), + error.tablet_id()); } Status st = _index_channel->check_intolerable_failure(); @@ -387,7 +389,7 @@ Status NodeChannel::close_wait(RuntimeState* state) { while (!_add_batches_finished && !_cancelled) { SleepFor(MonoDelta::FromMilliseconds(1)); } -_close_time_ms = UnixMillis() - _close_time_ms; +_close_time_ms = UnixMillis() - _close_time_ms; if (_add_batches_finished) { { @@ -676,7 +678,8 @@ OlapTableSink::~OlapTableSink() { // OlapTableSink::_mem_tracker and its parents. // But their destructions are after OlapTableSink's. for (auto index_channel : _channels) { -index_channel->for_each_node_channel([](const std::shared_ptr& ch) { ch->clear_all_batches(); }); +index_channel->for_each_node_channel( +[](const std::shared_ptr& ch) { ch->clear_all_batches(); }); } } @@ -838,11 +841,13 @@ Status OlapTableSink::open(RuntimeState* state) { RETURN_IF
[incubator-doris] 07/20: [fix](load) fix another bug that BE may crash when calling `mark_as_failed` (#8607)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit a06844719fc4de71eba35d52e6e786e5c5308902 Author: Mingyu Chen AuthorDate: Thu Mar 24 09:13:54 2022 +0800 [fix](load) fix another bug that BE may crash when calling `mark_as_failed` (#8607) Same as #8501 --- be/src/exec/tablet_sink.cpp | 5 + .../src/main/java/org/apache/doris/catalog/Catalog.java | 12 ++-- .../src/main/java/org/apache/doris/common/ErrorCode.java | 6 +++--- .../org/apache/doris/external/iceberg/IcebergCatalogMgr.java | 8 .../external/iceberg/IcebergTableCreationRecordMgr.java | 3 +-- 5 files changed, 19 insertions(+), 15 deletions(-) diff --git a/be/src/exec/tablet_sink.cpp b/be/src/exec/tablet_sink.cpp index 2dcd9d4..080db73 100644 --- a/be/src/exec/tablet_sink.cpp +++ b/be/src/exec/tablet_sink.cpp @@ -417,6 +417,11 @@ Status NodeChannel::close_wait(RuntimeState* state) { } void NodeChannel::cancel(const std::string& cancel_msg) { +// set _is_closed to true finally +Defer set_closed {[&]() { +std::lock_guard l(_closed_lock); +_is_closed = true; +}}; // we don't need to wait last rpc finished, cause closure's release/reset will join. // But do we need brpc::StartCancel(call_id)? _cancel_with_msg(cancel_msg); diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java index 2273d89..57a0666 100755 --- a/fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java +++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java @@ -3926,7 +3926,7 @@ public class Catalog { Pair result = db.createTableWithLock(olapTable, false, stmt.isSetIfNotExists()); if (!result.first) { - ErrorReport.reportDdlException(ErrorCode.ERR_CANT_CREATE_TABLE, tableName, "table already exists"); + ErrorReport.reportDdlException(ErrorCode.ERR_TABLE_EXISTS_ERROR, tableName); } if (result.second) { @@ -3975,7 +3975,7 @@ public class Catalog { MysqlTable mysqlTable = new MysqlTable(tableId, tableName, columns, stmt.getProperties()); mysqlTable.setComment(stmt.getComment()); if (!db.createTableWithLock(mysqlTable, false, stmt.isSetIfNotExists()).first) { -ErrorReport.reportDdlException(ErrorCode.ERR_CANT_CREATE_TABLE, tableName, "table already exist"); +ErrorReport.reportDdlException(ErrorCode.ERR_TABLE_EXISTS_ERROR, tableName); } LOG.info("successfully create table[{}-{}]", tableName, tableId); return; @@ -3989,7 +3989,7 @@ public class Catalog { OdbcTable odbcTable = new OdbcTable(tableId, tableName, columns, stmt.getProperties()); odbcTable.setComment(stmt.getComment()); if (!db.createTableWithLock(odbcTable, false, stmt.isSetIfNotExists()).first) { -ErrorReport.reportDdlException(ErrorCode.ERR_CANT_CREATE_TABLE, tableName, "table already exist"); +ErrorReport.reportDdlException(ErrorCode.ERR_TABLE_EXISTS_ERROR, tableName); } LOG.info("successfully create table[{}-{}]", tableName, tableId); return; @@ -4020,7 +4020,7 @@ public class Catalog { esTable.setComment(stmt.getComment()); if (!db.createTableWithLock(esTable, false, stmt.isSetIfNotExists()).first) { -ErrorReport.reportDdlException(ErrorCode.ERR_CANT_CREATE_TABLE, tableName, "table already exist"); +ErrorReport.reportDdlException(ErrorCode.ERR_TABLE_EXISTS_ERROR, tableName); } LOG.info("successfully create table{} with id {}", tableName, tableId); return esTable; @@ -4037,7 +4037,7 @@ public class Catalog { brokerTable.setBrokerProperties(stmt.getExtProperties()); if (!db.createTableWithLock(brokerTable, false, stmt.isSetIfNotExists()).first) { -ErrorReport.reportDdlException(ErrorCode.ERR_CANT_CREATE_TABLE, tableName, "table already exist"); +ErrorReport.reportDdlException(ErrorCode.ERR_TABLE_EXISTS_ERROR, tableName); } LOG.info("successfully create table[{}-{}]", tableName, tableId); @@ -4058,7 +4058,7 @@ public class Catalog { } // check hive table if exists in doris database if (!db.createTableWithLock(hiveTable, false, stmt.isSetIfNotExists()).first) { -ErrorReport.reportDdlException(ErrorCode.ERR_CANT_CREATE_TABLE, tableName, "table already exist"); +ErrorReport.reportDdlException(ErrorCode.ERR_TABLE_EXISTS_ERROR, tableName); } LOG.info("successfully create table[{}-{}]", tableName, tableId); } diff --git a/fe/fe-core/src/main/java/org/apache/doris/common/ErrorCode.java b/fe/f
[incubator-doris] 13/20: [chore] optimize aws thirdparty package download. (#8637)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 035006fb6cded49014be8b566b3e4998078b176d Author: Zhengguo Yang AuthorDate: Mon Mar 28 09:35:51 2022 +0800 [chore] optimize aws thirdparty package download. (#8637) --- be/CMakeLists.txt | 33 +++-- fe/pom.xml| 5 - thirdparty/download-thirdparty.sh | 9 +++-- 3 files changed, 18 insertions(+), 29 deletions(-) diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt index cd55f8f..691ca26 100644 --- a/be/CMakeLists.txt +++ b/be/CMakeLists.txt @@ -93,26 +93,9 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") endif() endif() -set(PIC_LIB_PATH "${THIRDPARTY_DIR}") -if(PIC_LIB_PATH) -message(STATUS "defined PIC_LIB_PATH") -set(CMAKE_SKIP_RPATH TRUE) -set(Boost_USE_STATIC_LIBS ON) -set(Boost_USE_STATIC_RUNTIME ON) -set(LIBBZ2 ${PIC_LIB_PATH}/lib/libbz2.a) -set(LIBZ ${PIC_LIB_PATH}/lib/libz.a) -set(LIBEVENT ${PIC_LIB_PATH}/lib/libevent.a) -set(LIBEVENT_PTHREADS ${PIC_LIB_PATH}/lib/libevent_pthreads.a) -else() -message(STATUS "undefined PIC_LIB_PATH") -set(Boost_USE_STATIC_LIBS ON) -set(Boost_USE_STATIC_RUNTIME ON) -set(LIBBZ2 -lbz2) -set(LIBZ -lz) -set(LIBEVENT event) -set(LIBEVENT_PTHREADS libevent_pthreads) -endif() - +set(CMAKE_SKIP_RPATH TRUE) +set(Boost_USE_STATIC_LIBS ON) +set(Boost_USE_STATIC_RUNTIME ON) # Compile generated source if necessary message(STATUS "build gensrc if necessary") @@ -206,6 +189,12 @@ set_target_properties(libevent PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/li add_library(libevent_pthreads STATIC IMPORTED) set_target_properties(libevent_pthreads PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libevent_pthreads.a) +add_library(libbz2 STATIC IMPORTED) +set_target_properties(libbz2 PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libbz2.a) + +add_library(libz STATIC IMPORTED) +set_target_properties(libz PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libz.a) + add_library(crypto STATIC IMPORTED) set_target_properties(crypto PROPERTIES IMPORTED_LOCATION ${THIRDPARTY_DIR}/lib/libcrypto.a) @@ -549,8 +538,8 @@ set(COMMON_THIRDPARTY idn gsasl curl -${LIBZ} -${LIBBZ2} +libz +libbz2 gflags brpc protobuf diff --git a/fe/pom.xml b/fe/pom.xml index 47423f7..2ac18dd 100644 --- a/fe/pom.xml +++ b/fe/pom.xml @@ -239,11 +239,6 @@ under the License. cloudera-public https://repository.cloudera.com/artifactory/public/ - - -oracleReleases -https://download.oracle.com/maven - diff --git a/thirdparty/download-thirdparty.sh b/thirdparty/download-thirdparty.sh index 74919f3..b17c6be 100755 --- a/thirdparty/download-thirdparty.sh +++ b/thirdparty/download-thirdparty.sh @@ -305,8 +305,13 @@ echo "Finished patching $LIBRDKAFKA_SOURCE" cd $TP_SOURCE_DIR/$AWS_SDK_SOURCE if [ ! -f $PATCHED_MARK ]; then if [ $AWS_SDK_SOURCE == "aws-sdk-cpp-1.9.211" ]; then -wget --no-check-certificate -q https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/aws-crt-cpp-1.9.211.tar.gz -tar xzf aws-crt-cpp-1.9.211.tar.gz +wget --no-check-certificate -q https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/aws-crt-cpp-1.9.211.tar.gz -O aws-crt-cpp-1.9.211.tar.gz +ret="$?" +if [ $ret -eq 0 ] ; then +tar xzf aws-crt-cpp-1.9.211.tar.gz +else +bash ./prefetch_crt_dependency.sh +fi else bash ./prefetch_crt_dependency.sh fi - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 16/20: [fix](mini-load) Remove mini load in LOADING and PENDING state (#8649)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 2e1e2b308fd27c809708de269a28495546bc9049 Author: Mingyu Chen AuthorDate: Mon Mar 28 10:22:17 2022 +0800 [fix](mini-load) Remove mini load in LOADING and PENDING state (#8649) 1. Remove some unused code. 2. handle mini load with wrong state 1. For some historical reasons, some mini load jobs in LOADING state have not been cleared. As a result, new load jobs cannot be committed. 2. If a mini load job is created right before FE restart, the mini load job will be in PENDING state forever. But it should be removed finally. --- .../org/apache/doris/load/loadv2/LoadManager.java | 146 +++-- 1 file changed, 21 insertions(+), 125 deletions(-) diff --git a/fe/fe-core/src/main/java/org/apache/doris/load/loadv2/LoadManager.java b/fe/fe-core/src/main/java/org/apache/doris/load/loadv2/LoadManager.java index ae0f97c..64e0973 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/load/loadv2/LoadManager.java +++ b/fe/fe-core/src/main/java/org/apache/doris/load/loadv2/LoadManager.java @@ -36,7 +36,6 @@ import org.apache.doris.common.UserException; import org.apache.doris.common.io.Writable; import org.apache.doris.common.util.LogBuilder; import org.apache.doris.common.util.LogKey; -import org.apache.doris.common.util.TimeUtils; import org.apache.doris.load.EtlJobType; import org.apache.doris.load.FailMsg; import org.apache.doris.load.FailMsg.CancelType; @@ -45,9 +44,7 @@ import org.apache.doris.system.SystemInfoService; import org.apache.doris.thrift.TMiniLoadBeginRequest; import org.apache.doris.thrift.TMiniLoadRequest; import org.apache.doris.thrift.TUniqueId; -import org.apache.doris.transaction.GlobalTransactionMgr; import org.apache.doris.transaction.TransactionState; -import org.apache.doris.transaction.TransactionStatus; import com.google.common.base.Strings; import com.google.common.collect.Lists; @@ -359,32 +356,6 @@ public class LoadManager implements Writable { } } -public void cancelLoadJob(CancelLoadStmt stmt) throws DdlException { -Database db = Catalog.getCurrentCatalog().getDbOrDdlException(stmt.getDbName()); - -LoadJob loadJob = null; -readLock(); -try { -Map> labelToLoadJobs = dbIdToLabelToLoadJobs.get(db.getId()); -if (labelToLoadJobs == null) { -throw new DdlException("Load job does not exist"); -} -List loadJobList = labelToLoadJobs.get(stmt.getLabel()); -if (loadJobList == null) { -throw new DdlException("Load job does not exist"); -} -Optional loadJobOptional = loadJobList.stream().filter(entity -> !entity.isTxnDone()).findFirst(); -if (!loadJobOptional.isPresent()) { -throw new DdlException("There is no uncompleted job which label is " + stmt.getLabel()); -} -loadJob = loadJobOptional.get(); -} finally { -readUnlock(); -} - -loadJob.cancelJob(new FailMsg(FailMsg.CancelType.USER_CANCEL, "user cancel")); -} - public void replayEndLoadJob(LoadJobFinalOperation operation) { LoadJob job = idToLoadJob.get(operation.getId()); if (job == null) { @@ -683,102 +654,6 @@ public class LoadManager implements Writable { } } -@Deprecated -// Deprecated in version 0.12 -// This method is only for bug fix. And should be call after image and edit log are replayed. -public void fixLoadJobMetaBugs(GlobalTransactionMgr txnMgr) { -for (LoadJob job : idToLoadJob.values()) { -/* - * Bug 1: - * in previous implementation, there is a bug that when the job's corresponding transaction is - * COMMITTED but not VISIBLE, the load job's state is LOADING, so that the job may be CANCELLED - * by timeout checker, which is not right. - * So here we will check each LOADING load jobs' txn status, if it is COMMITTED, change load job's - * state to COMMITTED. - * this method should be removed at next upgrading. - * only mini load job will be in LOADING state when persist, because mini load job is executed before writing - * edit log. - */ -if (job.getState() == JobState.LOADING) { -// unfortunately, transaction id in load job is also not persisted, so we have to traverse -// all transactions to find it. -TransactionState txn = txnMgr.getTransactionStateByCallbackIdAndStatus(job.getDbId(), job.getCallbackId(), -Sets.newHashSet(TransactionStatus.COMMITTED)); -if (txn != null) { -j
[incubator-doris] 10/20: [fix](load) Fix null column bug in load's mapping column setting (#8625)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit ca1974f9bd4da10986480321ae22310b05a62915 Author: HB <137497...@qq.com> AuthorDate: Mon Mar 28 10:08:00 2022 +0800 [fix](load) Fix null column bug in load's mapping column setting (#8625) --- fe/fe-core/src/main/java/org/apache/doris/load/Load.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fe/fe-core/src/main/java/org/apache/doris/load/Load.java b/fe/fe-core/src/main/java/org/apache/doris/load/Load.java index 76bff95..905f950 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/load/Load.java +++ b/fe/fe-core/src/main/java/org/apache/doris/load/Load.java @@ -1028,6 +1028,9 @@ public class Load { for (Entry>> entry : columnToHadoopFunction.entrySet()) { String mappingColumnName = entry.getKey(); Column mappingColumn = tbl.getColumn(mappingColumnName); +if (mappingColumn == null) { +throw new DdlException("Mapping column is not in table. column: " + mappingColumnName); +} Pair> function = entry.getValue(); try { DataDescription.validateMappingFunction(function.first, function.second, columnNameMap, - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 15/20: [chore] add -rtlib=compiler-rt for UBSAN under clang (#8647)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 6e6ba618b1d11c595808cf1ad1412aaf894137be Author: dataroaring <98214048+dataroar...@users.noreply.github.com> AuthorDate: Mon Mar 28 10:21:55 2022 +0800 [chore] add -rtlib=compiler-rt for UBSAN under clang (#8647) --- be/CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt index 691ca26..ebd9542 100644 --- a/be/CMakeLists.txt +++ b/be/CMakeLists.txt @@ -628,7 +628,7 @@ if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU") set(UBSAN_LIBS -static-libubsan tcmalloc) set(TSAN_LIBS -static-libtsan) else () -set(UBSAN_LIBS tcmalloc) +set(UBSAN_LIBS -rtlib=compiler-rt tcmalloc) endif () # Add sanitize static link flags or tcmalloc - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 08/20: [doc] fix typo for session (#8610)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 84331c8f472ffba897f5670308a4c0a83145f703 Author: GoGoWen <82132356+gogo...@users.noreply.github.com> AuthorDate: Thu Mar 24 09:14:44 2022 +0800 [doc] fix typo for session (#8610) --- docs/en/administrator-guide/bucket-shuffle-join.md| 2 +- docs/en/administrator-guide/vectorized-execution-engine.md| 2 +- docs/zh-CN/administrator-guide/bucket-shuffle-join.md | 2 +- docs/zh-CN/administrator-guide/vectorized-execution-engine.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/en/administrator-guide/bucket-shuffle-join.md b/docs/en/administrator-guide/bucket-shuffle-join.md index 0e67268..2ac58a2 100644 --- a/docs/en/administrator-guide/bucket-shuffle-join.md +++ b/docs/en/administrator-guide/bucket-shuffle-join.md @@ -95,7 +95,7 @@ The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`。 ## Planning rules of Bucket Shuffle Join -In most scenarios, users only need to turn on the seesion variable by default to transparently use the performance improvement brought by this join method. However, if we understand the planning rules of Bucket Shuffle Join, we can use it to write more efficient SQL. +In most scenarios, users only need to turn on the session variable by default to transparently use the performance improvement brought by this join method. However, if we understand the planning rules of Bucket Shuffle Join, we can use it to write more efficient SQL. * Bucket Shuffle Join only works when the join condition is equivalent. The reason is similar to Colorate Join. They all rely on hash to calculate the determined data distribution. * The bucket column of two tables is included in the equivalent join condition. When the bucket column of the left table is an equivalent join condition, it has a high probability of being planned as a Bucket Shuffle Join. diff --git a/docs/en/administrator-guide/vectorized-execution-engine.md b/docs/en/administrator-guide/vectorized-execution-engine.md index 425508c..c91d5cc 100644 --- a/docs/en/administrator-guide/vectorized-execution-engine.md +++ b/docs/en/administrator-guide/vectorized-execution-engine.md @@ -101,7 +101,7 @@ After the vectorized execution engine is enabled, `V` mark will be added before ## Some differences from the row-store execution engine -In most scenarios, users only need to turn on the seesion variable by default to transparently enable the vectorized execution engine and improve the performance of SQL execution. However, **the current vectorized execution engine is different from the original row-stored execution engine in the following minor details, which requires users to know**. This part of the difference is divided into two categories +In most scenarios, users only need to turn on the session variable by default to transparently enable the vectorized execution engine and improve the performance of SQL execution. However, **the current vectorized execution engine is different from the original row-stored execution engine in the following minor details, which requires users to know**. This part of the difference is divided into two categories * **Type A** : functions that need to be deprecated and deprecated or depended on by the inline execution engine. * **Type B**: Not supported on the vectorized execution engine in the short term, but will be supported by development in the future. diff --git a/docs/zh-CN/administrator-guide/bucket-shuffle-join.md b/docs/zh-CN/administrator-guide/bucket-shuffle-join.md index 6f629dd..67ac4a2 100644 --- a/docs/zh-CN/administrator-guide/bucket-shuffle-join.md +++ b/docs/zh-CN/administrator-guide/bucket-shuffle-join.md @@ -96,7 +96,7 @@ select * from test join [shuffle] baseall on test.k1 = baseall.k1; ## Bucket Shuffle Join的规划规则 -在绝大多数场景之中,用户只需要默认打开seesion变量的开关就可以透明的使用这种Join方式带来的性能提升,但是如果了解Bucket Shuffle Join的规划规则,可以帮助我们利用它写出更加高效的SQL。 +在绝大多数场景之中,用户只需要默认打开session变量的开关就可以透明的使用这种Join方式带来的性能提升,但是如果了解Bucket Shuffle Join的规划规则,可以帮助我们利用它写出更加高效的SQL。 * Bucket Shuffle Join只生效于Join条件为等值的场景,原因与Colocate Join类似,它们都依赖hash来计算确定的数据分布。 * 在等值Join条件之中包含两张表的分桶列,当左表的分桶列为等值的Join条件时,它有很大概率会被规划为Bucket Shuffle Join。 diff --git a/docs/zh-CN/administrator-guide/vectorized-execution-engine.md b/docs/zh-CN/administrator-guide/vectorized-execution-engine.md index b16a12b..acfa4a3 100644 --- a/docs/zh-CN/administrator-guide/vectorized-execution-engine.md +++ b/docs/zh-CN/administrator-guide/vectorized-execution-engine.md @@ -100,7 +100,7 @@ set batch_size = 4096; ## 与行存执行引擎的部分差异 -在绝大多数场景之中,用户只需要默认打开seesion变量的开关就可以透明的使向量化执行引擎并且得到SQL执行的性能提升。但是,**目前的向量化执行引擎在下面一些微小的细节上与原先的行存执行引擎存在不同,需要使用者知晓**。这部分区别分为两类 +在绝大多数场景之中,用户只需要默认打开session变量的开关就可以透明的使向量化执行引擎并且得到SQL执行的性能提升。但是,**目前的向量化执行引擎在下面一些微小的细节上与原先的行存执行引擎存在不同,需要使用者知晓**。这部分区别分
[incubator-doris] 17/20: [chore] Optimize build_lz4 in build-thirdparty.sh (#8653)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit c492f418a5e82734383feeb425d7d029a660e0e3 Author: Adonis Ling AuthorDate: Mon Mar 28 10:24:32 2022 +0800 [chore] Optimize build_lz4 in build-thirdparty.sh (#8653) --- thirdparty/build-thirdparty.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/thirdparty/build-thirdparty.sh b/thirdparty/build-thirdparty.sh index 4f175e9..3b7a108 100755 --- a/thirdparty/build-thirdparty.sh +++ b/thirdparty/build-thirdparty.sh @@ -427,6 +427,12 @@ build_lz4() { check_if_source_exist $LZ4_SOURCE cd $TP_SOURCE_DIR/$LZ4_SOURCE +# clean old symbolic links +local old_symbolic_links=('lz4c' 'lz4cat' 'unlz4') +for link in ${old_symbolic_links[@]}; do +rm -f "${TP_INSTALL_DIR}/bin/${link}" +done + make -j $PARALLEL install PREFIX=$TP_INSTALL_DIR BUILD_SHARED=no\ INCLUDEDIR=$TP_INCLUDE_DIR/lz4/ } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 09/20: [doc] fix help module failed (#8617)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 07cc83736bad3fd75bcb8afdfb5c2e52ecc07be5 Author: qiye AuthorDate: Thu Mar 24 09:15:06 2022 +0800 [doc] fix help module failed (#8617) Introduced by #8509. Docs title is duplicate. --- .../sql-reference/sql-functions/table-functions/explode-numbers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/zh-CN/sql-reference/sql-functions/table-functions/explode-numbers.md b/docs/zh-CN/sql-reference/sql-functions/table-functions/explode-numbers.md index d15799d..66e7f54 100644 --- a/docs/zh-CN/sql-reference/sql-functions/table-functions/explode-numbers.md +++ b/docs/zh-CN/sql-reference/sql-functions/table-functions/explode-numbers.md @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. --> -# explode_bitmap +# explode_numbers ## description - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 14/20: [fix](vec) fix coredump for aggregate function when delete large_data, due to alloc-dealloc-mismatch (#8641)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit d532aad9639bf8e022bf918a38b23ba7a18c7604 Author: airborne12 AuthorDate: Mon Mar 28 10:17:13 2022 +0800 [fix](vec) fix coredump for aggregate function when delete large_data, due to alloc-dealloc-mismatch (#8641) --- be/src/vec/aggregate_functions/aggregate_function_min_max.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/be/src/vec/aggregate_functions/aggregate_function_min_max.h b/be/src/vec/aggregate_functions/aggregate_function_min_max.h index 2b769e5..26fd57e 100644 --- a/be/src/vec/aggregate_functions/aggregate_function_min_max.h +++ b/be/src/vec/aggregate_functions/aggregate_function_min_max.h @@ -225,7 +225,7 @@ private: char small_data[MAX_SMALL_STRING_SIZE]; /// Including the terminating zero. public: -~SingleValueDataString() { delete large_data; } +~SingleValueDataString() { delete[] large_data; } bool has() const { return size >= 0; } @@ -242,7 +242,7 @@ public: if (size != -1) { size = -1; capacity = 0; -delete large_data; +delete[] large_data; large_data = nullptr; } } @@ -266,7 +266,7 @@ public: } else { if (capacity < rhs_size) { capacity = static_cast(round_up_to_power_of_two_or_zero(rhs_size)); -delete large_data; +delete[] large_data; large_data = new char[capacity]; } @@ -296,7 +296,7 @@ public: if (capacity < value_size) { /// Don't free large_data here. capacity = round_up_to_power_of_two_or_zero(value_size); -delete large_data; +delete[] large_data; large_data = new char[capacity]; } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 18/20: [doc] update doc of vec-execution-engine (#8655)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 0d9d7863e3776ff7ea296f2a861b9c035e417688 Author: zbtzbtzbt <35688959+zbtzbt...@users.noreply.github.com> AuthorDate: Mon Mar 28 10:26:28 2022 +0800 [doc] update doc of vec-execution-engine (#8655) --- docs/en/administrator-guide/vectorized-execution-engine.md| 9 +++-- docs/zh-CN/administrator-guide/vectorized-execution-engine.md | 9 +++-- 2 files changed, 6 insertions(+), 12 deletions(-) diff --git a/docs/en/administrator-guide/vectorized-execution-engine.md b/docs/en/administrator-guide/vectorized-execution-engine.md index c91d5cc..37d9bca 100644 --- a/docs/en/administrator-guide/vectorized-execution-engine.md +++ b/docs/en/administrator-guide/vectorized-execution-engine.md @@ -118,9 +118,6 @@ In most scenarios, users only need to turn on the session variable by default to 1. The `geolocation function` is not supported, including all functions starting with `ST_` in the function. For details, please refer to the section on SQL functions in the official documentation. 2. The `UDF` and `UDAF` of the original row storage execution engine are not supported. -3. It is not supported to rewrite the between statement into a compound judgment statement, which will result in the following error: `BetweenPredicate needs to be rewritten into a CompoundPredicate`. -4. The `TupleIsNull` function is not supported, which may cause partial outer joins and expressions with non-Nullable functions to obtain the required NULL value. -5. The maximum length of `string/text` type is 1MB instead of the default 2GB. That is, when the vectorization engine is turned on, it is impossible to query or import strings larger than 1MB. However, if you turn off the vectorization engine, you can still query and import normally. -6. The export method of `select ... into outfile` is not supported. -7. Lateral view is not supported. -8. Extrenal broker appearance is not supported. +3. The maximum length of `string/text` type is 1MB instead of the default 2GB. That is, when the vectorization engine is turned on, it is impossible to query or import strings larger than 1MB. However, if you turn off the vectorization engine, you can still query and import normally. +4. The export method of `select ... into outfile` is not supported. +5. Extrenal broker appearance is not supported. diff --git a/docs/zh-CN/administrator-guide/vectorized-execution-engine.md b/docs/zh-CN/administrator-guide/vectorized-execution-engine.md index acfa4a3..f97be1f 100644 --- a/docs/zh-CN/administrator-guide/vectorized-execution-engine.md +++ b/docs/zh-CN/administrator-guide/vectorized-execution-engine.md @@ -115,9 +115,6 @@ set batch_size = 4096; b类 1. 不支持`地理位置函数` ,包含了函数中所有以`ST_`开头的函数。具体请参考官方文档SQL函数的部分。 2. 不支持原有行存执行引擎的`UDF`与`UDAF`。 -3. 不支持将between语句改写为复合判断语句,会导致以下报错:`BetweenPredicate needs to be rewritten into a CompoundPredicate`。 -4. 不支持`TupleIsNull`函数,可能会导致部分外连接并带有非Nullable函数计算的表达式无法得到所需的NULL值。 -5. `string/text`类型最大长度支持为1MB,而不是默认的2GB。即当开启向量化引擎后,将无法查询或导入大于1MB的字符串。但如果关闭向量化引擎,则依然可以正常查询和导入。 -6. 不支持 `select ... into outfile` 的导出方式。 -7. 不支持lateral view。 -8. 不支持extrenal broker外表。 +3. `string/text`类型最大长度支持为1MB,而不是默认的2GB。即当开启向量化引擎后,将无法查询或导入大于1MB的字符串。但如果关闭向量化引擎,则依然可以正常查询和导入。 +4. 不支持 `select ... into outfile` 的导出方式。 +5. 不支持extrenal broker外表。 - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] 19/20: [Refactor] Remove ununsed file (#8657)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit 535e5744003ce24232fd4bdc72e41454a8af9095 Author: Pxl <952130...@qq.com> AuthorDate: Sun Mar 27 01:41:06 2022 +0800 [Refactor] Remove ununsed file (#8657) --- be/src/exec/pl_task_root.cpp | 140 --- be/src/exec/pl_task_root.h | 47 --- 2 files changed, 187 deletions(-) diff --git a/be/src/exec/pl_task_root.cpp b/be/src/exec/pl_task_root.cpp deleted file mode 100644 index 4b1b0ae..000 --- a/be/src/exec/pl_task_root.cpp +++ /dev/null @@ -1,140 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor license agreements. See the NOTICE file -// distributed with this work for additional information -// regarding copyright ownership. The ASF licenses this file -// to you under the Apache License, Version 2.0 (the -// "License"); you may not use this file except in compliance -// with the License. You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, -// software distributed under the License is distributed on an -// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -// KIND, either express or implied. See the License for the -// specific language governing permissions and limitations -// under the License. - -#include "exec/pl_task_root.h" - -namespace doris { - -ExchangeNode::ExchangeNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs) -: ExecNode(pool, tnode, descs), _num_senders(0), _stream_recvr(nullptr), _next_row_idx(0) {} - -ExchangeNode::~ExchangeNode() {} - -Status ExchangeNode::init(const TPlanNode& tnode, RuntimeState* state) { -return ExecNode::init(tnode, state); -} - -Status ExchangeNode::prepare(RuntimeState* state) { -RETURN_IF_ERROR(ExecNode::prepare(state)); - -_convert_row_batch_timer = ADD_TIMER(runtime_profile(), "ConvertRowBatchTime"); - -// TODO: figure out appropriate buffer size -DCHECK_GT(_num_senders, 0); -_stream_recvr = state->create_recvr(_row_descriptor, _id, _num_senders, -config::exchg_node_buffer_size_bytes, runtime_profile()); -return Status::OK(); -} - -Status ExchangeNode::open(RuntimeState* state) { -SCOPED_TIMER(_runtime_profile->total_time_counter()); -RETURN_IF_ERROR(ExecNode::open(state)); -return Status::OK(); -} - -Status ExchangeNode::close(RuntimeState* state) { -if (is_closed()) { -return Status::OK(); -} -return ExecNode::close(state); -} - -Status ExchangeNode::get_next(RuntimeState* state, RowBatch* output_batch, bool* eos) { -RETURN_IF_ERROR(exec_debug_action(TExecNodePhase::GETNEXT)); -SCOPED_TIMER(_runtime_profile->total_time_counter()); - -if (reached_limit()) { -*eos = true; -return Status::OK(); -} - -ExprContext* const* ctxs = &_conjunct_ctxs[0]; -int num_ctxs = _conjunct_ctxs.size(); - -while (true) { -{ -SCOPED_TIMER(_convert_row_batch_timer); - -// copy rows until we hit the limit/capacity or until we exhaust _input_batch -while (!reached_limit() && !output_batch->is_full() && _input_batch.get() != nullptr && - _next_row_idx < _input_batch->capacity()) { -TupleRow* src = _input_batch->get_row(_next_row_idx); - -if (ExecNode::eval_conjuncts(ctxs, num_ctxs, src)) { -int j = output_batch->add_row(); -TupleRow* dest = output_batch->get_row(j); -// if the input row is shorter than the output row, make sure not to leave -// uninitialized Tuple* around -output_batch->clear_row(dest); -// this works as expected if rows from input_batch form a prefix of -// rows in output_batch -_input_batch->copy_row(src, dest); -output_batch->commit_last_row(); -++_num_rows_returned; -} - -++_next_row_idx; -} - -COUNTER_SET(_rows_returned_counter, _num_rows_returned); - -if (reached_limit()) { -*eos = true; -return Status::OK(); -} - -if (output_batch->is_full()) { -*eos = false; -return Status::OK(); -} -} - -// we need more rows -if (_input_batch.get() != nullptr) { -_input_batch->transfer_resource_ownership(output_batch); -} - -bool is_cancelled = true; -_input_batch.reset(_stream_recvr->get_batch(&is_cancelled)); -VLOG_FILE << "exch: has batch=" << (_input_batch.get() =
[incubator-doris] 20/20: [fix] fix core dump when avg on not null decimal in empty table (#8681)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch dev-1.0.1 in repository https://gitbox.apache.org/repos/asf/incubator-doris.git commit e660dd323c5dfbb1198c429cd368be3b299aa235 Author: dataroaring <98214048+dataroar...@users.noreply.github.com> AuthorDate: Mon Mar 28 12:41:00 2022 +0800 [fix] fix core dump when avg on not null decimal in empty table (#8681) --- .../aggregate_functions/aggregate_function_avg.h | 6 +++-- fe/pom.xml | 2 +- regression-test/suites/empty_table/ddl/empty.sql | 7 + regression-test/suites/empty_table/load.groovy | 31 ++ .../suites/empty_table/sql/avg_decimal.sql | 1 + 5 files changed, 44 insertions(+), 3 deletions(-) diff --git a/be/src/vec/aggregate_functions/aggregate_function_avg.h b/be/src/vec/aggregate_functions/aggregate_function_avg.h index 7b40f95..4aeb7fe 100644 --- a/be/src/vec/aggregate_functions/aggregate_function_avg.h +++ b/be/src/vec/aggregate_functions/aggregate_function_avg.h @@ -40,8 +40,10 @@ struct AggregateFunctionAvgData { if constexpr (std::numeric_limits::is_iec559) return static_cast(sum) / count; /// allow division by zero -if (!count) -throw Exception("AggregateFunctionAvg with zero values", TStatusCode::VEC_LOGIC_ERROR); +if (!count) { +// null is handled in AggregationNode::_get_without_key_result +return static_cast(sum); +} return static_cast(sum) / count; } diff --git a/fe/pom.xml b/fe/pom.xml index 2ac18dd..61aafd7 100644 --- a/fe/pom.xml +++ b/fe/pom.xml @@ -139,7 +139,7 @@ under the License. 0.11-a-czt02-cdh 3.18.2-GA 3.0.1 -7.3.7 +18.3.12 6.1.14 1.4.3 1.49 diff --git a/regression-test/suites/empty_table/ddl/empty.sql b/regression-test/suites/empty_table/ddl/empty.sql new file mode 100644 index 000..c3d423d --- /dev/null +++ b/regression-test/suites/empty_table/ddl/empty.sql @@ -0,0 +1,7 @@ +CREATE TABLE `empty` ( + `c1` INT, + `c2` String, + `c3` Decimal(15, 2) NOT NULL +) +DISTRIBUTED BY HASH(`c1`) BUCKETS 1 +PROPERTIES("replication_num" = "1"); diff --git a/regression-test/suites/empty_table/load.groovy b/regression-test/suites/empty_table/load.groovy new file mode 100644 index 000..a8a554e --- /dev/null +++ b/regression-test/suites/empty_table/load.groovy @@ -0,0 +1,31 @@ + +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// The cases is copied from https://github.com/trinodb/trino/tree/master +// /testing/trino-product-tests/src/main/resources/sql-tests/testcases +// and modified by Doris. + +def tables=["empty"] + +for (String table in tables) { +sql """ DROP TABLE IF EXISTS $table """ +} + +for (String table in tables) { +sql new File("""${context.file.parent}/ddl/${table}.sql""").text +} diff --git a/regression-test/suites/empty_table/sql/avg_decimal.sql b/regression-test/suites/empty_table/sql/avg_decimal.sql new file mode 100644 index 000..d9cdcd2 --- /dev/null +++ b/regression-test/suites/empty_table/sql/avg_decimal.sql @@ -0,0 +1 @@ +SELECT avg(c3) from empty - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee commented on a change in pull request #8660: [Vectorized][refactor] refactor stddev/variance agg functions
HappenLee commented on a change in pull request #8660: URL: https://github.com/apache/incubator-doris/pull/8660#discussion_r836116818 ## File path: be/src/vec/aggregate_functions/aggregate_function_stddev.h ## @@ -296,4 +287,21 @@ class AggregateFunctionStddevSamp final } }; +//samp function it's always nullables, it's need to handle nullable column +//so return type and add function should processing null values +template +class AggregateFunctionSamp : public AggregateFunctionSampVariance { Review comment: use `final` inherit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei opened a new pull request #8694: Remove mem tablet from be
yiguolei opened a new pull request #8694: URL: https://github.com/apache/incubator-doris/pull/8694 # Proposed changes Issue Number: close #xxx ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes/No/I Don't know) 2. Has unit tests been added: (Yes/No/No Need) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes/No) 5. Are there any changes that cannot be rolled back: (Yes/No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch stream-load-vec created (now 365eba0)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a change to branch stream-load-vec in repository https://gitbox.apache.org/repos/asf/incubator-doris.git. at 365eba0 [fix] fix core dump when avg on not null decimal in empty table (#8681) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee commented on a change in pull request #8572: [Vectorized][Bug] fix percentile_approx function to return always nullable
HappenLee commented on a change in pull request #8572: URL: https://github.com/apache/incubator-doris/pull/8572#discussion_r836114439 ## File path: be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h ## @@ -152,32 +166,86 @@ class AggregateFunctionPercentileApproxMerge : public AggregateFunctionPercentil } }; +template class AggregateFunctionPercentileApproxTwoParams : public AggregateFunctionPercentileApprox { public: AggregateFunctionPercentileApproxTwoParams(const DataTypes& argument_types_) : AggregateFunctionPercentileApprox(argument_types_) {} void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num, Arena*) const override { -const auto& sources = static_cast&>(*columns[0]); -const auto& quantile = static_cast&>(*columns[1]); +if constexpr (is_nullable) { +double column_data[2] = {0}; + +for (int i = 0; i < 2; ++i) { +const auto* nullable_column = check_and_get_column(columns[i]); +if (nullable_column == nullptr) { //Not Nullable column +const auto& column = static_cast&>(*columns[i]); +column_data[i] = column.get_float64(row_num); + +} else if (!nullable_column->is_null_at(row_num)) { // Nullable column && Not null data +const auto& column = static_cast&>( +nullable_column->get_nested_column()); +column_data[i] = column.get_float64(row_num); + +} else { // Nullable column && null data +if (i == 0) { +return; +} +} +} + +this->data(place).init(); +this->data(place).add(column_data[0], column_data[1]); + +} else { +const auto& sources = static_cast&>(*columns[0]); +const auto& quantile = static_cast&>(*columns[1]); -this->data(place).init(); -this->data(place).add(sources.get_float64(row_num), quantile.get_float64(row_num)); +this->data(place).init(); +this->data(place).add(sources.get_float64(row_num), quantile.get_float64(row_num)); +} } }; +template class AggregateFunctionPercentileApproxThreeParams : public AggregateFunctionPercentileApprox { public: AggregateFunctionPercentileApproxThreeParams(const DataTypes& argument_types_) : AggregateFunctionPercentileApprox(argument_types_) {} void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num, Arena*) const override { -const auto& sources = static_cast&>(*columns[0]); -const auto& quantile = static_cast&>(*columns[1]); -const auto& compression = static_cast&>(*columns[2]); +if constexpr (is_nullable) { +double column_data[3] = {0}; Review comment: {0,0,0} ## File path: be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h ## @@ -152,32 +166,86 @@ class AggregateFunctionPercentileApproxMerge : public AggregateFunctionPercentil } }; +template class AggregateFunctionPercentileApproxTwoParams : public AggregateFunctionPercentileApprox { public: AggregateFunctionPercentileApproxTwoParams(const DataTypes& argument_types_) : AggregateFunctionPercentileApprox(argument_types_) {} void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num, Arena*) const override { -const auto& sources = static_cast&>(*columns[0]); -const auto& quantile = static_cast&>(*columns[1]); +if constexpr (is_nullable) { +double column_data[2] = {0}; + +for (int i = 0; i < 2; ++i) { +const auto* nullable_column = check_and_get_column(columns[i]); +if (nullable_column == nullptr) { //Not Nullable column +const auto& column = static_cast&>(*columns[i]); +column_data[i] = column.get_float64(row_num); + +} else if (!nullable_column->is_null_at(row_num)) { // Nullable column && Not null data +const auto& column = static_cast&>( +nullable_column->get_nested_column()); +column_data[i] = column.get_float64(row_num); + +} else { // Nullable column && null data +if (i == 0) { +return; +} +} +} + +this->data(place).init(); +this->data(place).add(column_data[0], column_data[1]); + +} else { +const auto& sources = static_cast&>(*columns[0]); +const auto& quantile = static_cast&>(*columns[1]); -this->data(place).init(); -
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8572: [Vectorized][Bug] fix percentile_approx function to return always nullable
github-actions[bot] commented on pull request #8572: URL: https://github.com/apache/incubator-doris/pull/8572#issuecomment-1080292911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8642: [Enhancement] add clang-tidy config && add C++ Code Diagnostic document
github-actions[bot] commented on pull request #8642: URL: https://github.com/apache/incubator-doris/pull/8642#issuecomment-1080301617 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow opened a new pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow opened a new pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695 # Proposed changes broadcast join cost is used compressed data size currently. The amount of memory used may be significantly more than estimated. This patch 1. add a compressed ratio to broadcast join cost and set to 5 according to the experience. 2. add a new session variable `auto_broadcast_join_threshold` to limit memory used by broadcast in bytes, the default value is 1073741824(1GB) ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes) 3. Has unit tests been added: (Yes) 4. Has document been added or modified: (Yes) 5. Does it need to update dependencies: (No) 6. Are there any changes that cannot be rolled back: (No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zbtzbtzbt commented on pull request #8694: Remove mem tablet from be
zbtzbtzbt commented on pull request #8694: URL: https://github.com/apache/incubator-doris/pull/8694#issuecomment-1080351400 6 6 6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8694: [refactor]Remove mem tablet from be
github-actions[bot] commented on pull request #8694: URL: https://github.com/apache/incubator-doris/pull/8694#issuecomment-1080377500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] emerkfu opened a new pull request #8696: [doc] Update flink-doris-connector.md
emerkfu opened a new pull request #8696: URL: https://github.com/apache/incubator-doris/pull/8696 # Proposed changes 1. Added the required Flink dependencies for flink-doris-connector. 2. Document format and name case modification. ## Problem Summary: 1. There is no description of the necessary Flink dependencies for flink-doris-connector in the document. When using it, there may be an error that the class cannot be found, so the description of the dependency part is added. 2. Optimized the format of this document. ## Checklist(Required) 1. Does it affect the original behavior: (No) 3. Has unit tests been added: (No) 4. Has document been added or modified: (Yes) 5. Does it need to update dependencies: (No) 6. Are there any changes that cannot be rolled back: (No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei commented on pull request #8693: [refactor] remove `atomic.h/cpp` use std::atomic instead
yiguolei commented on pull request #8693: URL: https://github.com/apache/incubator-doris/pull/8693#issuecomment-1080381807 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zbtzbtzbt commented on pull request #8693: [refactor] remove `atomic.h/cpp` use std::atomic instead
zbtzbtzbt commented on pull request #8693: URL: https://github.com/apache/incubator-doris/pull/8693#issuecomment-1080385301 I have a small question: why some variables are initialized to 0 and some not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] luzhijing commented on pull request #8696: [doc] Update flink-doris-connector.md
luzhijing commented on pull request #8696: URL: https://github.com/apache/incubator-doris/pull/8696#issuecomment-1080398613 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8696: [doc] Update flink-doris-connector.md
github-actions[bot] commented on pull request #8696: URL: https://github.com/apache/incubator-doris/pull/8696#issuecomment-1080398695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] wangbo commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
wangbo commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836220463 ## File path: be/src/olap/null_predicate.cpp ## @@ -29,6 +29,13 @@ namespace doris { NullPredicate::NullPredicate(uint32_t column_id, bool is_null, bool opposite) : ColumnPredicate(column_id), _is_null(opposite != is_null) {} +PredicateType NullPredicate::type() const { +if (_is_null) Review comment: ```suggestion return _is_null ? PredicateType::IsNull : PredicateType::NotIsNull; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on pull request #8693: [refactor] remove `atomic.h/cpp` use std::atomic instead
yangzhg commented on pull request #8693: URL: https://github.com/apache/incubator-doris/pull/8693#issuecomment-1080401312 > I have a small question: why some variables are initialized to 0 and some not Previous `AtomicInt` will initialize value as `0`, but `std::atomic` not, the variables not initialize as 0 will be initialized at ctor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] adonis0147 closed issue #8687: [Bug] Failed to build be.
adonis0147 closed issue #8687: URL: https://github.com/apache/incubator-doris/issues/8687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #8695: [enhancement] update broadcast join cost algorithm
EmmyMiao87 commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080408707 Why add a memory control to limit the broadcast memory? Instead of using mem limit uniformly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] wangbo commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
wangbo commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836228572 ## File path: be/src/vec/columns/column_dictionary.h ## @@ -264,127 +261,121 @@ class ColumnDictionary final : public COWHelper> { ColumnPtr convert_to_predicate_column() { auto res = vectorized::PredicateColumnType::create(); -size_t size = codes.size(); +size_t size = _codes.size(); res->reserve(size); for (size_t i = 0; i < size; ++i) { -auto& code = reinterpret_cast(codes[i]); -auto value = dict.get_value(code); +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); res->insert_data(value.ptr, value.len); } -dict.clear(); +_dict.clear(); return res; } -void convert_dict_codes() { -if (!is_dict_sorted()) { -sort_dict(); -} - -if (!is_dict_code_converted()) { -for (size_t i = 0; i < size(); ++i) { -codes[i] = dict.convert_code(codes[i]); -} -_dict_code_converted = true; +ColumnPtr convert_to_predicate_column_if_dictionary() override { +auto res = vectorized::PredicateColumnType::create(); +size_t size = _codes.size(); +res->reserve(size); +for (size_t i = 0; i < size; ++i) { +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); +res->insert_data(value.ptr, value.len); } -} - -void sort_dict() { -dict.sort(); -_dict_sorted = true; +_dict.clear(); +return res; } class Dictionary { public: Dictionary() = default; void reserve(size_t n) { -dict_data.reserve(n); -inverted_index.reserve(n); +_dict_data.reserve(n); +_inverted_index.reserve(n); } inline void insert_value(StringValue& value) { -dict_data.push_back_without_reserve(value); -inverted_index[value] = inverted_index.size(); +_dict_data.push_back_without_reserve(value); +_inverted_index[value] = _inverted_index.size(); } -inline T find_code(const StringValue& value) const { -auto it = inverted_index.find(value); -if (it != inverted_index.end()) { +inline int32_t find_code(const StringValue& value) const { +auto it = _inverted_index.find(value); +if (it != _inverted_index.end()) { return it->second; } return -1; } -inline T find_bound_code(const StringValue& value, bool lower, bool eq) const { +inline int32_t find_code_by_bound(const StringValue& value, bool lower, bool eq) const { Review comment: why hard code ```int32_t``` here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] huangyuansheng opened a new issue #8697: [Bug] About full join and cross join
huangyuansheng opened a new issue #8697: URL: https://github.com/apache/incubator-doris/issues/8697 ### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Version Master of 2022-02-15 ### What's Wrong? `create table test_a(a int) unique key(a) distributed by hash(a) buckets 1;` `create table test_b(b int) unique key(b) distributed by hash(b) buckets 1;` `create table test_c(c int) unique key(c) distributed by hash(c) buckets 1;` `insert into test_a select 1;` `insert into test_b select 2;` `insert into test_c select 3;` `select * from test_a full join test_b on (a = b) join test_c;` | a | b | c | |--|--|--| |1| | 3| || 2 | | `select * from (select * from test_a full join test_b on (a = b) ) t join test_c;` | a | b | c | |--|--|--| |1| | 3| || 2 | 3 | Results are not same! The second result is right! ### What You Expected? See above ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] huangyuansheng opened a new issue #8698: [Bug] About grouping sets
huangyuansheng opened a new issue #8698: URL: https://github.com/apache/incubator-doris/issues/8698 ### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Version Master of 2022-02-15 ### What's Wrong? create table test_a (a int ) unique key(a) distributed by hash(a) buckets 1; insert into test_a select 1; insert into test_a select null; select * from test_a; select a, count(1) from test_a group by GROUPING sets((a), ()) | a | count(1) | |--|--| |1| 1 | || 1 | || 2 | select b, count(1) from (select ifnull(a, 'XX') as b from test_a) x group by GROUPING sets((b), ()) | a | count(1) | |--|--| |1| 1 | | xx | 1 | | xx | 2 | ### What You Expected? | a | count(1) | |--|--| |1| 1 | | xx | 1 | | | 2 | ### How to Reproduce? _No response_ ### Anything Else? maybe like: https://github.com/apache/incubator-doris/issues/7012 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zbtzbtzbt commented on pull request #8693: [refactor] remove `atomic.h/cpp` use std::atomic instead
zbtzbtzbt commented on pull request #8693: URL: https://github.com/apache/incubator-doris/pull/8693#issuecomment-1080430760 > the variables not initialize as 0 will be initialized at ctor i get it, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
zenoyang commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836242857 ## File path: be/src/vec/columns/column_dictionary.h ## @@ -264,127 +261,121 @@ class ColumnDictionary final : public COWHelper> { ColumnPtr convert_to_predicate_column() { auto res = vectorized::PredicateColumnType::create(); -size_t size = codes.size(); +size_t size = _codes.size(); res->reserve(size); for (size_t i = 0; i < size; ++i) { -auto& code = reinterpret_cast(codes[i]); -auto value = dict.get_value(code); +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); res->insert_data(value.ptr, value.len); } -dict.clear(); +_dict.clear(); return res; } -void convert_dict_codes() { -if (!is_dict_sorted()) { -sort_dict(); -} - -if (!is_dict_code_converted()) { -for (size_t i = 0; i < size(); ++i) { -codes[i] = dict.convert_code(codes[i]); -} -_dict_code_converted = true; +ColumnPtr convert_to_predicate_column_if_dictionary() override { +auto res = vectorized::PredicateColumnType::create(); +size_t size = _codes.size(); +res->reserve(size); +for (size_t i = 0; i < size; ++i) { +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); +res->insert_data(value.ptr, value.len); } -} - -void sort_dict() { -dict.sort(); -_dict_sorted = true; +_dict.clear(); +return res; } class Dictionary { public: Dictionary() = default; void reserve(size_t n) { -dict_data.reserve(n); -inverted_index.reserve(n); +_dict_data.reserve(n); +_inverted_index.reserve(n); } inline void insert_value(StringValue& value) { -dict_data.push_back_without_reserve(value); -inverted_index[value] = inverted_index.size(); +_dict_data.push_back_without_reserve(value); +_inverted_index[value] = _inverted_index.size(); } -inline T find_code(const StringValue& value) const { -auto it = inverted_index.find(value); -if (it != inverted_index.end()) { +inline int32_t find_code(const StringValue& value) const { +auto it = _inverted_index.find(value); +if (it != _inverted_index.end()) { return it->second; } return -1; } -inline T find_bound_code(const StringValue& value, bool lower, bool eq) const { +inline int32_t find_code_by_bound(const StringValue& value, bool lower, bool eq) const { Review comment: Because the `dict_code` in `ColumnPredicate` is fixed to `int32_t` type, the `dict_code` returned by `find_code...` needs to be compared with the `dict_code` in `ColumnPredicate`. `T` may be `int8_t` or `int16_t`, so cast is performed when find returns. And if there is no cast here, there will be a compilation error, similar to `phmap::flat_hash_set` cannot be converted to `phmap::flat_hash_set` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] huangyuansheng opened a new issue #8699: [Bug] non-partitioned need to check partition expr
huangyuansheng opened a new issue #8699: URL: https://github.com/apache/incubator-doris/issues/8699 ### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Version Master of 2022-02-15 ### What's Wrong? create table test_a (a int) unique key(a) distributed by hash(a) buckets 1; select count(1) from test_a partition p_ Execute it sucessfully! ### What You Expected? Throw exception! ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] wangbo commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
wangbo commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836248638 ## File path: be/src/vec/columns/column_dictionary.h ## @@ -264,127 +261,121 @@ class ColumnDictionary final : public COWHelper> { ColumnPtr convert_to_predicate_column() { auto res = vectorized::PredicateColumnType::create(); -size_t size = codes.size(); +size_t size = _codes.size(); res->reserve(size); for (size_t i = 0; i < size; ++i) { -auto& code = reinterpret_cast(codes[i]); -auto value = dict.get_value(code); +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); res->insert_data(value.ptr, value.len); } -dict.clear(); +_dict.clear(); return res; } -void convert_dict_codes() { -if (!is_dict_sorted()) { -sort_dict(); -} - -if (!is_dict_code_converted()) { -for (size_t i = 0; i < size(); ++i) { -codes[i] = dict.convert_code(codes[i]); -} -_dict_code_converted = true; +ColumnPtr convert_to_predicate_column_if_dictionary() override { +auto res = vectorized::PredicateColumnType::create(); Review comment: What's the difference between ```convert_to_predicate_column_if_dictionary``` and ```convert_to_predicate_column ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] dataalive opened a new issue #8700: [Enhancement] show brokers instant of show broker
dataalive opened a new issue #8700: URL: https://github.com/apache/incubator-doris/issues/8700 ### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description why is ```show broker``` not ```show brokers``` ``` mysql> show broker; +--++---+---+-+-++ | Name | IP | Port | Alive | LastStartTime | LastUpdateTime | ErrMsg | +--++---+---+-+-++ | mybroker | 172.21.0.5 | 48000 | true | 2022-03-28 15:53:49 | 2022-03-28 17:46:26 || +--++---+---+-+-++ 1 row in set (0.00 sec) mysql> show brokers; ERROR 1105 (HY000): errCode = 2, detailMessage = Syntax error in line 1: show brokers ^ Encountered: IDENTIFIER Expected mysql> show proc '/brokers' -> ; +--++--+---+---+-+-++ | Name | IP | HostName | Port | Alive | LastStartTime | LastUpdateTime | ErrMsg | +--++--+---+---+-+-++ | mybroker | 172.21.0.5 | dev-bj01 | 48000 | true | 2022-03-28 15:53:49 | 2022-03-28 17:47:21 || +--++--+---+---+-+-++ 1 row in set (0.13 sec) ``` other roles show function is ```show backends;``` or ```show frontends;``` ### Solution support or change command with ```show brokers``` ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
zenoyang commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836253005 ## File path: be/src/vec/columns/column_dictionary.h ## @@ -264,127 +261,121 @@ class ColumnDictionary final : public COWHelper> { ColumnPtr convert_to_predicate_column() { auto res = vectorized::PredicateColumnType::create(); -size_t size = codes.size(); +size_t size = _codes.size(); res->reserve(size); for (size_t i = 0; i < size; ++i) { -auto& code = reinterpret_cast(codes[i]); -auto value = dict.get_value(code); +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); res->insert_data(value.ptr, value.len); } -dict.clear(); +_dict.clear(); return res; } -void convert_dict_codes() { -if (!is_dict_sorted()) { -sort_dict(); -} - -if (!is_dict_code_converted()) { -for (size_t i = 0; i < size(); ++i) { -codes[i] = dict.convert_code(codes[i]); -} -_dict_code_converted = true; +ColumnPtr convert_to_predicate_column_if_dictionary() override { +auto res = vectorized::PredicateColumnType::create(); Review comment: No difference, `convert_to_predicate_column` is useless, I'll delete it right away -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] wangbo commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
wangbo commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836253588 ## File path: be/src/olap/column_predicate.h ## @@ -33,13 +33,30 @@ class VectorizedRowBatch; class Schema; class RowBlockV2; +enum class PredicateType { +Unknown = 0, +EQ = 1, +NE = 2, +LT = 3, +LE = 4, +GT = 5, +GE = 6, +InList = 7, Review comment: ```suggestion IN_LIST = 7, ``` You can refer other ```enum class in BE code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] wangbo commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
wangbo commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836253588 ## File path: be/src/olap/column_predicate.h ## @@ -33,13 +33,30 @@ class VectorizedRowBatch; class Schema; class RowBlockV2; +enum class PredicateType { +Unknown = 0, +EQ = 1, +NE = 2, +LT = 3, +LE = 4, +GT = 5, +GE = 6, +InList = 7, Review comment: ```suggestion IN_LIST = 7, ``` You can refer other ```enum class``` in BE code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] hf200012 opened a new pull request #8701: [Improve]Remove http v1 code and disable mini load by default
hf200012 opened a new pull request #8701: URL: https://github.com/apache/incubator-doris/pull/8701 1. http v2 has been actually tested in production, and it is completely replaceable to have http code. In order to simplify code maintenance, remove the previous http part of the code 2. The mini load function can completely pass the stream load function. In the next version, the mini load transition is disabled through the switch, and then this code is removed in the subsequent version. If users need to use the mini load function, they can add in fe.conf `disable_mini_load=false` to enable, the default is disabled # Proposed changes Issue Number: close #xxx ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes) 2. Has unit tests been added: (No Need) 4. Has document been added or modified: (Yes) 5. Does it need to update dependencies: (No) 6. Are there any changes that cannot be rolled back: (No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee commented on a change in pull request #8597: [feature-wip](array-type)Add element_at and subscript functions
HappenLee commented on a change in pull request #8597: URL: https://github.com/apache/incubator-doris/pull/8597#discussion_r836141107 ## File path: be/src/vec/functions/array/function_array_index.h ## @@ -126,21 +139,37 @@ class FunctionArrayIndex : public IFunction return true; } -#define INTEGRAL_TPL_PACK UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64 +#define INTEGRAL_TPL_PACK Int8, Int16, Int32, Int64, Float32, Float64 Review comment: please change the name, float in cpp is not `intergal` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
zenoyang commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836261018 ## File path: be/src/olap/column_predicate.h ## @@ -33,13 +33,30 @@ class VectorizedRowBatch; class Schema; class RowBlockV2; +enum class PredicateType { +Unknown = 0, +EQ = 1, +NE = 2, +LT = 3, +LE = 4, +GT = 5, +GE = 6, +InList = 7, Review comment: done ## File path: be/src/olap/null_predicate.cpp ## @@ -29,6 +29,13 @@ namespace doris { NullPredicate::NullPredicate(uint32_t column_id, bool is_null, bool opposite) : ColumnPredicate(column_id), _is_null(opposite != is_null) {} +PredicateType NullPredicate::type() const { +if (_is_null) Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
zenoyang commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836261369 ## File path: be/src/vec/columns/column_dictionary.h ## @@ -264,127 +261,121 @@ class ColumnDictionary final : public COWHelper> { ColumnPtr convert_to_predicate_column() { auto res = vectorized::PredicateColumnType::create(); -size_t size = codes.size(); +size_t size = _codes.size(); res->reserve(size); for (size_t i = 0; i < size; ++i) { -auto& code = reinterpret_cast(codes[i]); -auto value = dict.get_value(code); +auto& code = reinterpret_cast(_codes[i]); +auto value = _dict.get_value(code); res->insert_data(value.ptr, value.len); } -dict.clear(); +_dict.clear(); return res; } -void convert_dict_codes() { -if (!is_dict_sorted()) { -sort_dict(); -} - -if (!is_dict_code_converted()) { -for (size_t i = 0; i < size(); ++i) { -codes[i] = dict.convert_code(codes[i]); -} -_dict_code_converted = true; +ColumnPtr convert_to_predicate_column_if_dictionary() override { +auto res = vectorized::PredicateColumnType::create(); Review comment: > No difference, `convert_to_predicate_column` is useless, I'll delete it right away done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 opened a new pull request #8702: [typo] Fix runtime filter docs
EmmyMiao87 opened a new pull request #8702: URL: https://github.com/apache/incubator-doris/pull/8702 # Proposed changes Issue Number: close #xxx ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes/No/I Don't know) 2. Has unit tests been added: (Yes/No/No Need) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes/No) 5. Are there any changes that cannot be rolled back: (Yes/No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8702: [typo] Fix runtime filter docs
github-actions[bot] commented on pull request #8702: URL: https://github.com/apache/incubator-doris/pull/8702#issuecomment-1080466337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8532: [feature](cold-hot) support remote storage
github-actions[bot] commented on pull request #8532: URL: https://github.com/apache/incubator-doris/pull/8532#issuecomment-1080472729 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] emerkfu opened a new pull request #8703: [doc] Update VARCHAR.md
emerkfu opened a new pull request #8703: URL: https://github.com/apache/incubator-doris/pull/8703 # Proposed changes Added explicit description of 'M' in VARCHAR(M). ## Problem Summary: The meaning of 'M' may be mixed with bytes or characters, here it is clarified that the meaning of 'M' is the number of bytes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes/No/I Don't know) 2. Has unit tests been added: (Yes/No/No Need) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes/No) 5. Are there any changes that cannot be rolled back: (Yes/No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #8663: Support remote storage, step2, only for be, add env_remote_mgr
morningman commented on a change in pull request #8663: URL: https://github.com/apache/incubator-doris/pull/8663#discussion_r836164163 ## File path: gensrc/thrift/HeartbeatService.thrift ## @@ -32,6 +33,7 @@ struct TMasterInfo { 6: optional Types.TPort http_port 7: optional i64 heartbeat_flags 8: optional i64 backend_id +9: optional list remote_storage_params Review comment: don't use heartbeat to pass these info. ## File path: gensrc/proto/olap_file.proto ## @@ -213,6 +214,24 @@ enum StorageMediumPB { HDD = 0; SSD = 1; S3 = 2; +REMOTE_CACHE = 99; +} + +message S3StorageParamPB { +required string s3_endpoint = 1; +required string s3_region = 2; +optional string s3_ak = 3; +optional string s3_sk = 4; +optional int32 s3_max_conn = 5 [default = 50]; +optional int32 s3_request_timeout_ms = 6 [default = 3000]; +optional int32 s3_conn_timeout_ms = 7 [default = 1000]; +optional string root_path = 8; +} + +message StorageParamPB { +required StorageMediumPB storage_medium = 1 [default = HDD]; Review comment: use optional ## File path: gensrc/proto/olap_file.proto ## @@ -213,6 +214,24 @@ enum StorageMediumPB { HDD = 0; SSD = 1; S3 = 2; +REMOTE_CACHE = 99; +} + +message S3StorageParamPB { +required string s3_endpoint = 1; Review comment: use optional ## File path: be/src/olap/snapshot_manager.cpp ## @@ -98,13 +98,13 @@ OLAPStatus SnapshotManager::release_snapshot(const string& snapshot_path) { continue; } std::string abs_path; -RETURN_WITH_WARN_IF_ERROR(store->env()->canonicalize(store->path(), &abs_path), +RETURN_WITH_WARN_IF_ERROR(Env::Default()->canonicalize(store->path(), &abs_path), Review comment: Why not still using `store->env()`? ## File path: be/src/olap/data_dir.cpp ## @@ -384,6 +358,10 @@ OLAPStatus DataDir::_check_incompatible_old_format_tablet() { // TODO(ygl): deal with rowsets and tablets when load failed OLAPStatus DataDir::load() { LOG(INFO) << "start to load tablets from " << _path_desc.filepath; +if (is_remote()) { + RETURN_WITH_WARN_IF_ERROR(Env::get_remote_mgr()->init(_path_desc.filepath + STORAGE_PARAM_PREFIX), + OLAP_ERR_INIT_FAILED, "DataDir init failed."); +} Review comment: Do we still to go on if this dir is a remote dir? ## File path: be/src/olap/olap_define.h ## @@ -82,7 +82,8 @@ enum OLAPDataVersion { static const std::string MINI_PREFIX = "/mini_download"; static const std::string CLUSTER_ID_PREFIX = "/cluster_id"; static const std::string DATA_PREFIX = "/data"; -static const std::string TABLET_UID = "/tablet_uid"; +static const std::string STORAGE_PARAM_PREFIX = "/storage_param"; +static const std::string REMOTE_FILE_PARAM = "/remote_file_param"; Review comment: Not used? ## File path: be/src/olap/base_tablet.cpp ## @@ -63,14 +65,22 @@ OLAPStatus BaseTablet::set_tablet_state(TabletState state) { } void BaseTablet::_gen_tablet_path() { -if (_data_dir != nullptr) { +if (_data_dir != nullptr && _tablet_meta != nullptr) { +FilePathDesc root_path_desc; +root_path_desc.filepath = _data_dir->path_desc().filepath; +root_path_desc.storage_name = _storage_param.storage_name(); +root_path_desc.storage_medium = fs::fs_util::get_t_storage_medium(_storage_param.storage_medium()); +if (_data_dir->is_remote() && !Env::get_remote_mgr()->get_root_path( +_storage_param.storage_name(), &(root_path_desc.remote_path)).ok()) { +LOG(WARNING) << "get_root_path failed for storage_name: " << _storage_param.storage_name(); Review comment: Why can we continue even if we meet error here? ## File path: be/src/env/env_remote_mgr.h ## @@ -0,0 +1,58 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include + +#include "env/env_remote.h" +#include "util/mutex.h" + +namespace doris { + +class RemoteEnvMgr { Review comment: Add some comm
[GitHub] [incubator-doris] wangbo commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
wangbo commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836305968 ## File path: be/src/olap/comparison_predicate.cpp ## @@ -159,60 +159,54 @@ COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(GreaterEqualPredicate, >=) .get_data(); \ auto& nested_col = nullable_col->get_nested_column(); \ if (nested_col.is_column_dictionary()) { \ -if constexpr (std::is_same_v) { \ +if constexpr (std::is_same_v) { \ Review comment: Is this ```std::is_same_v``` necessary? If a column is ColumnDict, then its TYPE in predicate can only be StringValue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new issue #8704: [Bug] routine load return too many task error
morningman opened a new issue #8704: URL: https://github.com/apache/incubator-doris/issues/8704 ### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Version master ### What's Wrong? This bug may cause sevaral errors: 1. lots of `FragmentMgr cancel worker going to cancel timeout fragment` in be.INFO 2. Routine load return TOO_MANY_TASKS error 3. pstack shows lots of thread blocked in `doris::stream_load::NodeChannel::~NodeChannel` ### What You Expected? The above error should not happen ### How to Reproduce? occasionally ### Anything Else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #8705: [fix](load) fix bug that NodeChannel can not be destroyed ontime
morningman opened a new pull request #8705: URL: https://github.com/apache/incubator-doris/pull/8705 # Proposed changes Issue Number: close #8704 ## Problem Summary: After the ReusableClosure is reset, we can not call join() method, or it will blocked forever. ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (No Need) 3. Has document been added or modified: (No Need) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #8705: [fix](load) fix bug that NodeChannel can not be destroyed ontime
morningman commented on a change in pull request #8705: URL: https://github.com/apache/incubator-doris/pull/8705#discussion_r836309355 ## File path: be/src/exec/tablet_sink.cpp ## @@ -504,7 +504,6 @@ void NodeChannel::try_send_batch(RuntimeState* state) { } } -_add_batch_closure->reset(); Review comment: For reviewer: if call reset() here, the cancel() may be called after it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8705: [fix](load) fix bug that NodeChannel can not be destroyed ontime
github-actions[bot] commented on pull request #8705: URL: https://github.com/apache/incubator-doris/pull/8705#issuecomment-1080513523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] caiconghui commented on a change in pull request #8680: [Refactor](type_info) use template and single instance to refactor get type info logic
caiconghui commented on a change in pull request #8680: URL: https://github.com/apache/incubator-doris/pull/8680#discussion_r836319606 ## File path: be/src/olap/rowset/segment_v2/indexed_column_reader.cpp ## @@ -31,12 +31,12 @@ Status IndexedColumnReader::load(bool use_page_cache, bool kept_in_memory) { _use_page_cache = use_page_cache; _kept_in_memory = kept_in_memory; -_type_info = get_type_info((FieldType)_meta.data_type()); +_type_info = get_scalar_type_info((FieldType)_meta.data_type()); Review comment: because array, map and struct cannot be got only with FiedType info -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] caiconghui commented on a change in pull request #8680: [Refactor](type_info) use template and single instance to refactor get type info logic
caiconghui commented on a change in pull request #8680: URL: https://github.com/apache/incubator-doris/pull/8680#discussion_r836319780 ## File path: be/src/olap/tablet_schema.cpp ## @@ -286,7 +286,7 @@ TabletColumn::TabletColumn(FieldAggregationMethod agg, FieldType type) { TabletColumn::TabletColumn(FieldAggregationMethod agg, FieldType filed_type, bool is_nullable) { _aggregation = agg; _type = filed_type; -_length = get_type_info(filed_type)->size(); +_length = get_scalar_type_info(filed_type)->size(); Review comment: because array, map and struct cannot be got only with FiedType info -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] caiconghui commented on a change in pull request #8680: [Refactor](type_info) use template and single instance to refactor get type info logic
caiconghui commented on a change in pull request #8680: URL: https://github.com/apache/incubator-doris/pull/8680#discussion_r836320044 ## File path: be/src/olap/column_vector.h ## @@ -150,7 +150,7 @@ class ScalarColumnVectorBatch : public ColumnVectorBatch { class ArrayNullColumnVectorBatch : public ColumnVectorBatch { public: explicit ArrayNullColumnVectorBatch(ColumnVectorBatch* array) -: ColumnVectorBatch(get_scalar_type_info(FieldType::OLAP_FIELD_TYPE_TINYINT), false), +: ColumnVectorBatch(get_scalar_type_info(), false), Review comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] chenlinzhong commented on a change in pull request #8604: [Feature] CSV import and export support header
chenlinzhong commented on a change in pull request #8604: URL: https://github.com/apache/incubator-doris/pull/8604#discussion_r836322875 ## File path: gensrc/thrift/PlanNodes.thrift ## @@ -108,6 +108,22 @@ enum TFileFormatType { FORMAT_ORC, FORMAT_JSON, FORMAT_PROTO, +//csv withnames +FORMAT_CSVWITHNAMES_PLAIN, Review comment: Yes, the more reasonable way is to enumerate the file type and compression type separately. However, since the previous logic is that the two are put together, the expansibility is not very good. If you want to separate them, there are many changes involved and the cycle will be relatively long. Therefore, the previous way is still temporarily continued here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] spaces-X opened a new pull request #8706: [Enhancement] add switch of quantile_state column
spaces-X opened a new pull request #8706: URL: https://github.com/apache/incubator-doris/pull/8706 # Proposed changes Add switch for quantile_state column, default false. ## Problem Summary: ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (No Need) 3. Has document been added or modified: (No Need) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on pull request #8683: optimize create tpch table statments to achieve higher performance
morningman commented on pull request #8683: URL: https://github.com/apache/incubator-doris/pull/8683#issuecomment-1080531532 > Can you provide some performance changes and conclusions before and after the table conversion? I think this may trigger some bucket shuffle join -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8668: [Bug][Vectorized] fix core dump with HLL and some refactor of Decompressor
github-actions[bot] commented on pull request #8668: URL: https://github.com/apache/incubator-doris/pull/8668#issuecomment-1080531428 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
zenoyang commented on a change in pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#discussion_r836325169 ## File path: be/src/olap/comparison_predicate.cpp ## @@ -159,60 +159,54 @@ COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(GreaterEqualPredicate, >=) .get_data(); \ auto& nested_col = nullable_col->get_nested_column(); \ if (nested_col.is_column_dictionary()) { \ -if constexpr (std::is_same_v) { \ +if constexpr (std::is_same_v) { \ Review comment: Yes, if I remove this judgment, the compilation will report an error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] spaces-X commented on pull request #8706: [Enhancement] add switch of quantile_state column
spaces-X commented on pull request #8706: URL: https://github.com/apache/incubator-doris/pull/8706#issuecomment-1080534118 @wangbo Add a switch of `quantile_state` column to ensure stability, and it will be removed after subsequent tests are stable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8683: optimize create tpch table statments to achieve higher performance
github-actions[bot] commented on pull request #8683: URL: https://github.com/apache/incubator-doris/pull/8683#issuecomment-1080536016 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8627: [refactor][optimize] Code optimization and refactoring for low-cardinality columns in storage layer
github-actions[bot] commented on pull request #8627: URL: https://github.com/apache/incubator-doris/pull/8627#issuecomment-1080537851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow commented on pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080544250 > Why add a memory control to limit the broadcast memory? Instead of using mem limit uniformly? there are 2 reason: 1. broadcast is not always fast than shuffle. The cost of creating a FULL TABLE hash table is not negligible when broadcast table is large. 2. In be, we allocate hash table in buffer pool, and it' is not limited by mem limit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8706: [Enhancement] add switch of quantile_state column
github-actions[bot] commented on pull request #8706: URL: https://github.com/apache/incubator-doris/pull/8706#issuecomment-1080546759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] cambyzju commented on a change in pull request #8597: [feature-wip](array-type)Add element_at and subscript functions
cambyzju commented on a change in pull request #8597: URL: https://github.com/apache/incubator-doris/pull/8597#discussion_r836337237 ## File path: be/src/vec/functions/array/function_array_index.h ## @@ -126,21 +139,37 @@ class FunctionArrayIndex : public IFunction return true; } -#define INTEGRAL_TPL_PACK UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64 +#define INTEGRAL_TPL_PACK Int8, Int16, Int32, Int64, Float32, Float64 Review comment: 1. already change from intergal to number; 2. also right column data type change from Resulting to RightType; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee commented on pull request #8597: [feature-wip](array-type)Add element_at and subscript functions
HappenLee commented on pull request #8597: URL: https://github.com/apache/incubator-doris/pull/8597#issuecomment-1080551003 Check the logic of Clickhouse 1. array_string call `array_position` with `int` param 2. array_float call `array_position` with `int` param Confirm the behavior rules of different array types and different data types -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8703: [doc] Update VARCHAR.md
github-actions[bot] commented on pull request #8703: URL: https://github.com/apache/incubator-doris/pull/8703#issuecomment-1080552101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
EmmyMiao87 commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836226913 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: Currently the average size of each row is indeed calculated using the compressed data. But simply multiplying by a fixed compression ratio is certainly inaccurate. Or can you use the actual test data to cite the test data that the compression ratio here will be more closely related to the actual memory consumption after multiplying by 5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] cambyzju commented on pull request #8597: [feature-wip](array-type)Add element_at and subscript functions
cambyzju commented on pull request #8597: URL: https://github.com/apache/incubator-doris/pull/8597#issuecomment-1080555999 > Check the logic of Clickhouse > > 1. array_string call `array_position` with `int` param > 2. array_float call `array_position` with `int` param >Confirm the behavior rules of different array types and different data types Got it. We will do it while extend array subtype to float and double. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] dataroaring opened a new pull request #8707: remove useless code in DataTypeDecimal
dataroaring opened a new pull request #8707: URL: https://github.com/apache/incubator-doris/pull/8707 # Proposed changes Issue Number: close #xxx ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes/No/I Don't know) 2. Has unit tests been added: (Yes/No/No Need) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes/No) 5. Are there any changes that cannot be rolled back: (Yes/No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836352339 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: yes, you are right. multiplying by a fixed ratio is not accurate. But the error in data size is not only just introduced by compressed ratio. Since, we don't have accurate statistics now, the average row size and cardinality are also inaccurate. So the compression ratio of 5 is used here, based on observations of imported data to Doris. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] dataroaring opened a new pull request #8708: comment code convering decimal format
dataroaring opened a new pull request #8708: URL: https://github.com/apache/incubator-doris/pull/8708 The comment can help newbies read code mouch more quickly. # Proposed changes Issue Number: close #xxx ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (Yes/No/I Don't know) 2. Has unit tests been added: (Yes/No/No Need) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes/No) 5. Are there any changes that cannot be rolled back: (Yes/No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #8695: [enhancement] update broadcast join cost algorithm
EmmyMiao87 commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080575928 Adding a session variable is more difficult for users to understand. The intuitive way is to add hint after join -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
EmmyMiao87 commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836367645 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: In fact, when we wrote this cost formula, we tested it with real memory consumption. According to the previous test, the current formula is relatively accurate. If you have actually tested the gap between cost and the real memory consumption, can you provide the test data. Let's analyze where the specific gap is, instead of simply multiplying by a fixed value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] xinyiZzz commented on pull request #8695: [enhancement] update broadcast join cost algorithm
xinyiZzz commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080590449 > > Why add a memory control to limit the broadcast memory? Instead of using mem limit uniformly? > > there are 2 reason: > > 1. broadcast is not always fast than shuffle. The cost of creating a FULL TABLE hash table is not negligible when broadcast table is large. > 2. In be, we allocate hash table in buffer pool, and it' is not limited by mem limit. 1. Added a new memory parameter that will make it more difficult for users to understand and debug. I understand that broadcast is faster than shuffle in most cases. If shuffle is faster than broadcast, it is not directly related to the size of the hash table, but is related to the gap between the data sizes of the left and right tables. In this case, can manually hint to specify the join method. 2. From what I see, the MemPool currently used by HashJoinNode allocates the memory of the HashTable, and the BufferPool is only used in the HashTable of the Partitioned Agg. If the remaining 1G is to reserve memory for a query except for hash join, we should try to estimate the memory consumption of all nodes in a fragment, and complete it by collecting statistics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] jacktengg opened a new pull request #8709: Stream load
jacktengg opened a new pull request #8709: URL: https://github.com/apache/incubator-doris/pull/8709 # Proposed changes Issue Number: close #xxx ## Problem Summary: Implement vectorized stream load. Added fe and be configuration option enable_vectorized_load to enable vectorized stream load. ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #8710: [fix](user-property) Fix bug that can not set exec_mem_limit at user level
morningman opened a new pull request #8710: URL: https://github.com/apache/incubator-doris/pull/8710 # Proposed changes Issue Number: close #xxx ## Problem Summary: Describe the overview of changes. ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No Need) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
xinyiZzz commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836383494 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: The `avgRowSize` statistics caused by the compression ratio are inaccurate, I understand that corrections should be made in `computeStats` of OlapScanNode, not in JoinCostEvaluation, until more accurate statistics collection is achieved.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
xinyiZzz commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836383494 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: The `avgRowSize` statistics caused by the compression ratio are inaccurate, I understand that corrections should be made in `computeStats` of OlapScanNode, not in JoinCostEvaluation, until more accurate statistics collection is achieved. `totalBytes` is the compressed size.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] caiconghui commented on pull request #8680: [Refactor](type_info) use template and single instance to refactor get type info logic
caiconghui commented on pull request #8680: URL: https://github.com/apache/incubator-doris/pull/8680#issuecomment-1080601888 > for some certainly types, I think all use unified `_typeinfo` do not reflect semantics, such as `bf_typeinfo` is better than `_typeinfo`. we may can modify some certainly types variables names. actually _type_info is a private variable of parent object , so I think it is clear now, we can easy get its usage by code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8710: [fix](user-property) Fix bug that can not set exec_mem_limit at user level
github-actions[bot] commented on pull request #8710: URL: https://github.com/apache/incubator-doris/pull/8710#issuecomment-1080602840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] tianhui5 opened a new pull request #8711: [Feature] Support load binlog from MySQL directly instead of Canal (#8025)
tianhui5 opened a new pull request #8711: URL: https://github.com/apache/incubator-doris/pull/8711 # Proposed changes Issue Number: close #8025 ## Problem Summary: ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes) 5. Are there any changes that cannot be rolled back: (No) ## Further comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] xinyiZzz commented on pull request #8695: [enhancement] update broadcast join cost algorithm
xinyiZzz commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080610064 Increasing the compression ratio is a good attempt~, do you have verification accuracy there. In #6274, I have compared the calculated broadcast cost to the actual memory used by BE, and multiplied it with a penalty factor of 1.1 `HASH_TBL_SPACE_OVERHEAD`. But I have not tested the effect of compression ratio alone, and only tested a limited number of cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836413449 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: > In fact, when we wrote this cost formula, we tested it with real memory consumption. According to the previous test, the current formula is relatively accurate. > > If you have actually tested the gap between cost and the real memory consumption, can you provide the test data. Let's analyze where the specific gap is, instead of simply multiplying by a fixed value. we do test on tpc-h with sf = 100 and 3 be with 64GB memory. q9, q12 and q21 failed on previous cost formula. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow commented on a change in pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow commented on a change in pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#discussion_r836413666 ## File path: fe/fe-core/src/main/java/org/apache/doris/planner/JoinCostEvaluation.java ## @@ -147,7 +149,7 @@ public long constructHashTableSpace() { Math.pow(1.5, (int) ((Math.log((double) rhsTreeCardinality/4096) / Math.log(1.5)) + 1)) * 4096; double nodeOverheadSpace = nodeArrayLen * 16; double nodeTuplePointerSpace = nodeArrayLen * rhsTreeTupleIdNum * 8; -return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize +return Math.round((bucketPointerSpace + (double) rhsTreeCardinality * rhsTreeAvgRowSize * COMPRESSION_RATIO Review comment: > The `avgRowSize` statistics caused by the compression ratio are inaccurate, I understand that corrections should be made in `computeStats` of OlapScanNode, not in JoinCostEvaluation, until more accurate statistics collection is achieved. > > `totalBytes` is the compressed size.  i agree with that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow commented on pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080633638 > Adding a session variable is more difficult for users to understand. The intuitive way is to add hint after join i don't think so. It is common that query submitted from programs. Programs cannot add hint by itself. But DBAs could set global variables according their cluster status. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morrySnow commented on pull request #8695: [enhancement] update broadcast join cost algorithm
morrySnow commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080649337 > > > Why add a memory control to limit the broadcast memory? Instead of using mem limit uniformly? > > > > > > there are 2 reason: > > > > 1. broadcast is not always fast than shuffle. The cost of creating a FULL TABLE hash table is not negligible when broadcast table is large. > > 2. In be, we allocate hash table in buffer pool, and it' is not limited by mem limit. > > 1. Added a new memory parameter that will make it more difficult for users to understand and debug. >I understand that broadcast is faster than shuffle in most cases. If shuffle is faster than broadcast, it is not directly related to the size of the hash table, but is related to the gap between the data sizes of the left and right tables. >In this case, can manually hint to specify the join method. About Create hash table is expensive when expand hash table size. it can't just include network overhead, If we need an accurate cost model. > 2. From what I see, the MemPool currently used by HashJoinNode allocates the memory of the HashTable, and the BufferPool is only used in the HashTable of the Partitioned Agg. > > If the remaining 1G is to reserve memory for a query except for hash join, we should try to estimate the memory consumption of all nodes in a fragment, and complete it by collecting statistics. i will recheck it, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org