[GitHub] [incubator-doris] morningman opened a new pull request #7939: [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment
morningman opened a new pull request #7939: URL: https://github.com/apache/incubator-doris/pull/7939 # Proposed changes Issue Number: close #xxx ## Problem Summary: This PR mainly changes: 1. Change the define of PBlock The new PBlock consists of a set of PColumnMeta and a binary buffer. The PColumnMeta records the metadata information of all columns in the Block, while the buffer stores the serialized binary data of all columns. 2. Refactor the serialize/deserizlie method of data type Rewrite the `serialize()/deserialize()` of IDataType. And also add a new method `get_uncompressed_serialized_bytes()` to get the total length of uncompressed serialized data of a column. 3. Rewrite the serialize/deserizlie method of Block Now, when serializing a Block to PBlock, it will first get the total length of uncompressed serialized data of all columns in this Block, and then allocate the memory to write the serialized data to the buffer. 4. Use brpc attachment to transmit the serialized column data ## Checklist(Required) 1. Does it affect the original behavior: (Yes) The way of transmitting serialized block has been changed 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No Need) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7817: [fix](load priv) modify error msg of checking table priv
github-actions[bot] commented on pull request #7817: URL: https://github.com/apache/incubator-doris/pull/7817#issuecomment-1028882434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7909: [chore] Set the full path of make program to CMake.
github-actions[bot] commented on pull request #7909: URL: https://github.com/apache/incubator-doris/pull/7909#issuecomment-1028883131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman closed issue #7869: [Vectorized][Bug] Mem Leak in agg/unique table
morningman closed issue #7869: URL: https://github.com/apache/incubator-doris/issues/7869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #7884: [Vectorized][Bug] This pr main fix the bug:
morningman merged pull request #7884: URL: https://github.com/apache/incubator-doris/pull/7884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: [fix](vec) Fix some bugs about vec engine (#7884)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new 51abaa8 [fix](vec) Fix some bugs about vec engine (#7884) 51abaa8 is described below commit 51abaa89f3d828dfdb8e6dfeef0d1424e28cdf05 Author: HappenLee AuthorDate: Thu Feb 3 19:21:17 2022 +0800 [fix](vec) Fix some bugs about vec engine (#7884) 1. mem leak in vcollector iter 2. query slow in agg table limit 10 3. query slow in SSB q4,q5,q6 --- be/src/exec/olap_scan_node.h | 3 ++- be/src/exec/olap_scanner.cpp | 3 +++ be/src/olap/reader.cpp | 1 + be/src/olap/reader.h | 5 be/src/olap/rowset/beta_rowset_reader.cpp | 33 --- be/src/olap/rowset/rowset_reader_context.h | 2 ++ be/src/olap/storage_engine.cpp | 10 +++ be/src/olap/tablet_schema.cpp | 3 ++- be/src/vec/columns/column_string.cpp | 1 + be/src/vec/exec/volap_scan_node.cpp| 42 ++ be/src/vec/exec/volap_scanner.cpp | 6 + be/src/vec/exec/volap_scanner.h| 10 +-- be/src/vec/olap/block_reader.cpp | 9 --- be/src/vec/olap/block_reader.h | 3 --- be/src/vec/olap/vcollect_iterator.cpp | 1 + 15 files changed, 85 insertions(+), 47 deletions(-) diff --git a/be/src/exec/olap_scan_node.h b/be/src/exec/olap_scan_node.h index d57a92d..82e98d5 100644 --- a/be/src/exec/olap_scan_node.h +++ b/be/src/exec/olap_scan_node.h @@ -160,7 +160,7 @@ protected: RuntimeProfile* profile); friend class OlapScanner; -friend class doris::vectorized::VOlapScanner; +friend class vectorized::VOlapScanner; // Tuple id resolved in prepare() to set _tuple_desc; TupleId _tuple_id; @@ -239,6 +239,7 @@ protected: SpinLock _status_mutex; Status _status; RuntimeState* _runtime_state; + RuntimeProfile::Counter* _scan_timer; RuntimeProfile::Counter* _scan_cpu_timer = nullptr; RuntimeProfile::Counter* _tablet_counter; diff --git a/be/src/exec/olap_scanner.cpp b/be/src/exec/olap_scanner.cpp index a1efc1d..d7dc839 100644 --- a/be/src/exec/olap_scanner.cpp +++ b/be/src/exec/olap_scanner.cpp @@ -59,6 +59,9 @@ Status OlapScanner::prepare( const std::vector>>& bloom_filters) { set_tablet_reader(); +// set limit to reduce end of rowset and segment mem use +_tablet_reader->set_batch_size(_parent->limit() == -1 ? _parent->_runtime_state->batch_size() : std::min( +static_cast(_parent->_runtime_state->batch_size()), _parent->limit())); // Get olap table TTabletId tablet_id = scan_range.tablet_id; diff --git a/be/src/olap/reader.cpp b/be/src/olap/reader.cpp index 13e50b8..4deda90 100644 --- a/be/src/olap/reader.cpp +++ b/be/src/olap/reader.cpp @@ -222,6 +222,7 @@ OLAPStatus TabletReader::_capture_rs_readers(const ReaderParams& read_params, _reader_context.runtime_state = read_params.runtime_state; _reader_context.use_page_cache = read_params.use_page_cache; _reader_context.sequence_id_idx = _sequence_col_idx; +_reader_context.batch_size = _batch_size; *valid_rs_readers = *rs_readers; diff --git a/be/src/olap/reader.h b/be/src/olap/reader.h index 82cd7ff..3137e06 100644 --- a/be/src/olap/reader.h +++ b/be/src/olap/reader.h @@ -133,6 +133,10 @@ public: _stats.rows_vec_del_cond_filtered; } +void set_batch_size(int batch_size) { +_batch_size = batch_size; +} + const OlapReaderStatistics& stats() const { return _stats; } OlapReaderStatistics* mutable_stats() { return &_stats; } @@ -210,6 +214,7 @@ protected: bool _filter_delete = false; int32_t _sequence_col_idx = -1; bool _direct_mode = false; +int _batch_size = 1024; CollectIterator _collect_iter; std::vector _key_cids; diff --git a/be/src/olap/rowset/beta_rowset_reader.cpp b/be/src/olap/rowset/beta_rowset_reader.cpp index 4d35f2f..263a4cc 100644 --- a/be/src/olap/rowset/beta_rowset_reader.cpp +++ b/be/src/olap/rowset/beta_rowset_reader.cpp @@ -131,20 +131,23 @@ OLAPStatus BetaRowsetReader::init(RowsetReaderContext* read_context) { _iterator.reset(final_iterator); // init input block -_input_block.reset(new RowBlockV2(schema, 1024, _parent_tracker)); - -// init input/output block and row -_output_block.reset(new RowBlock(read_context->tablet_schema, _parent_tracker)); - -RowBlockInfo output_block_info; -output_block_info.row_num = 1024; -output_block_info.null_supported = true; -// the output block's schema should be seek_columns to conform to v1 -// TODO(hkp): this should be optimized to use return_columns -output_block_info.column
[GitHub] [incubator-doris] morningman commented on a change in pull request #7828: [Vectorized][Feature] add ColumnHLL to support hll type
morningman commented on a change in pull request #7828: URL: https://github.com/apache/incubator-doris/pull/7828#discussion_r798466767 ## File path: be/src/vec/data_types/data_type_bitmap.h ## @@ -83,6 +84,9 @@ struct is_complex : std::false_type {}; template <> struct is_complex : std::true_type {}; +template <> +struct is_complex : std::true_type {}; Review comment: Why define HLL in `data_type_bitmap`? ## File path: be/src/vec/data_types/data_type_hll.cpp ## @@ -0,0 +1,103 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "vec/data_types/data_type_hll.h" + +#include "vec/columns/column_complex.h" +#include "vec/common/assert_cast.h" +#include "vec/io/io_helper.h" + +namespace doris::vectorized { + +size_t DataTypeHLL::serialize(const IColumn& column, PColumn* pcolumn) const { Review comment: The serialize() method has been refactored in #7939 , please change it after that PR being merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee commented on a change in pull request #7939: [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment
HappenLee commented on a change in pull request #7939: URL: https://github.com/apache/incubator-doris/pull/7939#discussion_r799195842 ## File path: be/src/vec/data_types/data_type_nullable.cpp ## @@ -53,33 +53,52 @@ std::string DataTypeNullable::to_string(const IColumn& column, size_t row_num) c } } -size_t DataTypeNullable::serialize(const IColumn& column, PColumn* pcolumn) const { +// binary: column num | | +// : is_null1 | is_null2 | ... +// : value1 | value2 | ...> +int64_t DataTypeNullable::get_uncompressed_serialized_bytes(const IColumn& column) const { +int64_t size = sizeof(uint32_t); +size += sizeof(bool) * column.size(); +size += nested_data_type->get_uncompressed_serialized_bytes(assert_cast(column).get_nested_column()); +return size; +} + +char* DataTypeNullable::serialize(const IColumn& column, char* buf) const { auto ptr = column.convert_to_full_column_if_const(); const ColumnNullable& col = assert_cast(*ptr.get()); -pcolumn->mutable_is_null()->Reserve(column.size()); +// column num +*reinterpret_cast(buf) = column.size(); +buf += sizeof(uint32_t); +// null flags for (size_t i = 0; i < column.size(); ++i) { -bool is_null = col.is_null_at(i); -pcolumn->add_is_null(is_null); +*reinterpret_cast(buf) = col.is_null_at(i); +buf += sizeof(bool); } - -return nested_data_type->serialize(col.get_nested_column(), pcolumn) + - sizeof(bool) * column.size(); +// data values +return nested_data_type->serialize(col.get_nested_column(), buf); } -void DataTypeNullable::deserialize(const PColumn& pcolumn, IColumn* column) const { +const char* DataTypeNullable::deserialize(const char* buf, IColumn* column) const { ColumnNullable* col = assert_cast(column); -col->get_null_map_data().reserve(pcolumn.is_null_size()); - -for (int i = 0; i < pcolumn.is_null_size(); ++i) { -if (pcolumn.is_null(i)) { -col->get_null_map_data().push_back(1); -} else { -col->get_null_map_data().push_back(0); -} +// column num +uint32_t column_num = *reinterpret_cast(buf); +buf += sizeof(uint32_t); +// null flags +col->get_null_map_data().reserve(column_num); +for (int i = 0; i < column_num; ++i) { Review comment: memcpy to speed up ## File path: be/src/vec/data_types/data_type_number_base.cpp ## @@ -65,30 +65,40 @@ std::string DataTypeNumberBase::to_string(const IColumn& column, size_t row_n } } +// binary: column num | value1 | value2 | ... template -size_t DataTypeNumberBase::serialize(const IColumn& column, PColumn* pcolumn) const { -const auto column_len = column.size(); -pcolumn->mutable_binary()->resize(column_len * sizeof(FieldType)); -auto* data = pcolumn->mutable_binary()->data(); +int64_t DataTypeNumberBase::get_uncompressed_serialized_bytes(const IColumn& column) const { +return sizeof(uint32_t) + column.size() * sizeof(FieldType); +} -// copy the data +template +char* DataTypeNumberBase::serialize(const IColumn& column, char* buf) const { +// column num +const auto column_num = column.size(); Review comment: Why each column need to serialize column sizeļ¼one block only need register block->rows() once ## File path: be/src/vec/data_types/data_type_nullable.cpp ## @@ -53,33 +53,52 @@ std::string DataTypeNullable::to_string(const IColumn& column, size_t row_num) c } } -size_t DataTypeNullable::serialize(const IColumn& column, PColumn* pcolumn) const { +// binary: column num | | +// : is_null1 | is_null2 | ... +// : value1 | value2 | ...> +int64_t DataTypeNullable::get_uncompressed_serialized_bytes(const IColumn& column) const { +int64_t size = sizeof(uint32_t); +size += sizeof(bool) * column.size(); +size += nested_data_type->get_uncompressed_serialized_bytes(assert_cast(column).get_nested_column()); +return size; +} + +char* DataTypeNullable::serialize(const IColumn& column, char* buf) const { auto ptr = column.convert_to_full_column_if_const(); const ColumnNullable& col = assert_cast(*ptr.get()); -pcolumn->mutable_is_null()->Reserve(column.size()); +// column num +*reinterpret_cast(buf) = column.size(); +buf += sizeof(uint32_t); +// null flags for (size_t i = 0; i < column.size(); ++i) { -bool is_null = col.is_null_at(i); -pcolumn->add_is_null(is_null); +*reinterpret_cast(buf) = col.is_null_at(i); Review comment: TODO: maybe here should use memcpy to speed up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org ---
[GitHub] [incubator-doris] yiguolei opened a new pull request #7940: [Feature] Add table function framework and add numbers table function
yiguolei opened a new pull request #7940: URL: https://github.com/apache/incubator-doris/pull/7940 - Add table function framework, developers could define new table function under this framework. - Add a demo table function numbers. User could use select sum(number) from numbers("1"). And we could use this table function to test our vector engines performance. - Numbers table function is only supported under vectoried engine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org