[GitHub] [incubator-doris] morningman opened a new pull request #7939: [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment

2022-02-03 Thread GitBox


morningman opened a new pull request #7939:
URL: https://github.com/apache/incubator-doris/pull/7939


   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   This PR mainly changes:
   
   1. Change the define of PBlock
   
   The new PBlock consists of a set of PColumnMeta and a binary buffer.
   The PColumnMeta records the metadata information of all columns in the 
Block,
   while the buffer stores the serialized binary data of all columns.
   
   2. Refactor the serialize/deserizlie method of data type
   
   Rewrite the `serialize()/deserialize()` of IDataType. And also add
   a new method `get_uncompressed_serialized_bytes()` to get the total 
length
   of uncompressed serialized data of a column.
   
   3. Rewrite the serialize/deserizlie method of Block
   
   Now, when serializing a Block to PBlock, it will first get the total 
length
   of uncompressed serialized data of all columns in this Block, and then 
allocate
   the memory to write the serialized data to the buffer.
   
   4. Use brpc attachment to transmit the serialized column data
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes)
   The way of transmitting serialized block has been changed 
   2. Has unit tests been added: (Yes)
   3. Has document been added or modified: (No Need)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (Yes)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7817: [fix](load priv) modify error msg of checking table priv

2022-02-03 Thread GitBox


github-actions[bot] commented on pull request #7817:
URL: https://github.com/apache/incubator-doris/pull/7817#issuecomment-1028882434






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #7909: [chore] Set the full path of make program to CMake.

2022-02-03 Thread GitBox


github-actions[bot] commented on pull request #7909:
URL: https://github.com/apache/incubator-doris/pull/7909#issuecomment-1028883131






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman closed issue #7869: [Vectorized][Bug] Mem Leak in agg/unique table

2022-02-03 Thread GitBox


morningman closed issue #7869:
URL: https://github.com/apache/incubator-doris/issues/7869


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #7884: [Vectorized][Bug] This pr main fix the bug:

2022-02-03 Thread GitBox


morningman merged pull request #7884:
URL: https://github.com/apache/incubator-doris/pull/7884


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: [fix](vec) Fix some bugs about vec engine (#7884)

2022-02-03 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 51abaa8  [fix](vec) Fix some bugs about vec engine (#7884)
51abaa8 is described below

commit 51abaa89f3d828dfdb8e6dfeef0d1424e28cdf05
Author: HappenLee 
AuthorDate: Thu Feb 3 19:21:17 2022 +0800

[fix](vec) Fix some bugs about vec engine (#7884)

1. mem leak in vcollector iter
2. query slow in agg table limit 10
3. query slow in SSB q4,q5,q6
---
 be/src/exec/olap_scan_node.h   |  3 ++-
 be/src/exec/olap_scanner.cpp   |  3 +++
 be/src/olap/reader.cpp |  1 +
 be/src/olap/reader.h   |  5 
 be/src/olap/rowset/beta_rowset_reader.cpp  | 33 ---
 be/src/olap/rowset/rowset_reader_context.h |  2 ++
 be/src/olap/storage_engine.cpp | 10 +++
 be/src/olap/tablet_schema.cpp  |  3 ++-
 be/src/vec/columns/column_string.cpp   |  1 +
 be/src/vec/exec/volap_scan_node.cpp| 42 ++
 be/src/vec/exec/volap_scanner.cpp  |  6 +
 be/src/vec/exec/volap_scanner.h| 10 +--
 be/src/vec/olap/block_reader.cpp   |  9 ---
 be/src/vec/olap/block_reader.h |  3 ---
 be/src/vec/olap/vcollect_iterator.cpp  |  1 +
 15 files changed, 85 insertions(+), 47 deletions(-)

diff --git a/be/src/exec/olap_scan_node.h b/be/src/exec/olap_scan_node.h
index d57a92d..82e98d5 100644
--- a/be/src/exec/olap_scan_node.h
+++ b/be/src/exec/olap_scan_node.h
@@ -160,7 +160,7 @@ protected:
 RuntimeProfile* profile);
 
 friend class OlapScanner;
-friend class doris::vectorized::VOlapScanner;
+friend class vectorized::VOlapScanner;
 
 // Tuple id resolved in prepare() to set _tuple_desc;
 TupleId _tuple_id;
@@ -239,6 +239,7 @@ protected:
 SpinLock _status_mutex;
 Status _status;
 RuntimeState* _runtime_state;
+
 RuntimeProfile::Counter* _scan_timer;
 RuntimeProfile::Counter* _scan_cpu_timer = nullptr;
 RuntimeProfile::Counter* _tablet_counter;
diff --git a/be/src/exec/olap_scanner.cpp b/be/src/exec/olap_scanner.cpp
index a1efc1d..d7dc839 100644
--- a/be/src/exec/olap_scanner.cpp
+++ b/be/src/exec/olap_scanner.cpp
@@ -59,6 +59,9 @@ Status OlapScanner::prepare(
 const std::vector>>&
 bloom_filters) {
 set_tablet_reader();
+// set limit to reduce end of rowset and segment mem use
+_tablet_reader->set_batch_size(_parent->limit() == -1 ? 
_parent->_runtime_state->batch_size() : std::min(
+static_cast(_parent->_runtime_state->batch_size()), 
_parent->limit()));
 
 // Get olap table
 TTabletId tablet_id = scan_range.tablet_id;
diff --git a/be/src/olap/reader.cpp b/be/src/olap/reader.cpp
index 13e50b8..4deda90 100644
--- a/be/src/olap/reader.cpp
+++ b/be/src/olap/reader.cpp
@@ -222,6 +222,7 @@ OLAPStatus TabletReader::_capture_rs_readers(const 
ReaderParams& read_params,
 _reader_context.runtime_state = read_params.runtime_state;
 _reader_context.use_page_cache = read_params.use_page_cache;
 _reader_context.sequence_id_idx = _sequence_col_idx;
+_reader_context.batch_size = _batch_size;
 
 *valid_rs_readers = *rs_readers;
 
diff --git a/be/src/olap/reader.h b/be/src/olap/reader.h
index 82cd7ff..3137e06 100644
--- a/be/src/olap/reader.h
+++ b/be/src/olap/reader.h
@@ -133,6 +133,10 @@ public:
_stats.rows_vec_del_cond_filtered;
 }
 
+void set_batch_size(int batch_size) {
+_batch_size = batch_size;
+}
+
 const OlapReaderStatistics& stats() const { return _stats; }
 OlapReaderStatistics* mutable_stats() { return &_stats; }
 
@@ -210,6 +214,7 @@ protected:
 bool _filter_delete = false;
 int32_t _sequence_col_idx = -1;
 bool _direct_mode = false;
+int _batch_size = 1024;
 
 CollectIterator _collect_iter;
 std::vector _key_cids;
diff --git a/be/src/olap/rowset/beta_rowset_reader.cpp 
b/be/src/olap/rowset/beta_rowset_reader.cpp
index 4d35f2f..263a4cc 100644
--- a/be/src/olap/rowset/beta_rowset_reader.cpp
+++ b/be/src/olap/rowset/beta_rowset_reader.cpp
@@ -131,20 +131,23 @@ OLAPStatus BetaRowsetReader::init(RowsetReaderContext* 
read_context) {
 _iterator.reset(final_iterator);
 
 // init input block
-_input_block.reset(new RowBlockV2(schema, 1024, _parent_tracker));
-
-// init input/output block and row
-_output_block.reset(new RowBlock(read_context->tablet_schema, 
_parent_tracker));
-
-RowBlockInfo output_block_info;
-output_block_info.row_num = 1024;
-output_block_info.null_supported = true;
-// the output block's schema should be seek_columns to conform to v1
-// TODO(hkp): this should be optimized to use return_columns
-output_block_info.column

[GitHub] [incubator-doris] morningman commented on a change in pull request #7828: [Vectorized][Feature] add ColumnHLL to support hll type

2022-02-03 Thread GitBox


morningman commented on a change in pull request #7828:
URL: https://github.com/apache/incubator-doris/pull/7828#discussion_r798466767



##
File path: be/src/vec/data_types/data_type_bitmap.h
##
@@ -83,6 +84,9 @@ struct is_complex : std::false_type {};
 template <>
 struct is_complex : std::true_type {};
 
+template <>
+struct is_complex : std::true_type {};

Review comment:
   Why define HLL in `data_type_bitmap`?

##
File path: be/src/vec/data_types/data_type_hll.cpp
##
@@ -0,0 +1,103 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "vec/data_types/data_type_hll.h"
+
+#include "vec/columns/column_complex.h"
+#include "vec/common/assert_cast.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+size_t DataTypeHLL::serialize(const IColumn& column, PColumn* pcolumn) const {

Review comment:
   The serialize() method has been refactored in #7939 , please change it 
after that PR being merged.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] HappenLee commented on a change in pull request #7939: [improvement][refactor](vec) Refactor serde of vec block and using brpc attachment

2022-02-03 Thread GitBox


HappenLee commented on a change in pull request #7939:
URL: https://github.com/apache/incubator-doris/pull/7939#discussion_r799195842



##
File path: be/src/vec/data_types/data_type_nullable.cpp
##
@@ -53,33 +53,52 @@ std::string DataTypeNullable::to_string(const IColumn& 
column, size_t row_num) c
 }
 }
 
-size_t DataTypeNullable::serialize(const IColumn& column, PColumn* pcolumn) 
const {
+// binary: column num |  | 
+//  : is_null1 | is_null2 | ...
+//  : value1 | value2 | ...>
+int64_t DataTypeNullable::get_uncompressed_serialized_bytes(const IColumn& 
column) const {
+int64_t size = sizeof(uint32_t);
+size += sizeof(bool) * column.size();
+size += 
nested_data_type->get_uncompressed_serialized_bytes(assert_cast(column).get_nested_column());
+return size;
+}
+
+char* DataTypeNullable::serialize(const IColumn& column, char* buf) const {
 auto ptr = column.convert_to_full_column_if_const();
 const ColumnNullable& col = assert_cast(*ptr.get());
-pcolumn->mutable_is_null()->Reserve(column.size());
 
+// column num
+*reinterpret_cast(buf) = column.size();
+buf += sizeof(uint32_t);
+// null flags
 for (size_t i = 0; i < column.size(); ++i) {
-bool is_null = col.is_null_at(i);
-pcolumn->add_is_null(is_null);
+*reinterpret_cast(buf) = col.is_null_at(i);
+buf += sizeof(bool);
 }
-
-return nested_data_type->serialize(col.get_nested_column(), pcolumn) +
-   sizeof(bool) * column.size();
+// data values
+return nested_data_type->serialize(col.get_nested_column(), buf);
 }
 
-void DataTypeNullable::deserialize(const PColumn& pcolumn, IColumn* column) 
const {
+const char* DataTypeNullable::deserialize(const char* buf, IColumn* column) 
const {
 ColumnNullable* col = assert_cast(column);
-col->get_null_map_data().reserve(pcolumn.is_null_size());
-
-for (int i = 0; i < pcolumn.is_null_size(); ++i) {
-if (pcolumn.is_null(i)) {
-col->get_null_map_data().push_back(1);
-} else {
-col->get_null_map_data().push_back(0);
-}
+// column num
+uint32_t column_num = *reinterpret_cast(buf);
+buf += sizeof(uint32_t);
+// null flags
+col->get_null_map_data().reserve(column_num);
+for (int i = 0; i < column_num; ++i) {

Review comment:
   memcpy to speed up

##
File path: be/src/vec/data_types/data_type_number_base.cpp
##
@@ -65,30 +65,40 @@ std::string DataTypeNumberBase::to_string(const IColumn& 
column, size_t row_n
 }
 }
 
+// binary: column num | value1 | value2 | ...
 template 
-size_t DataTypeNumberBase::serialize(const IColumn& column, PColumn* 
pcolumn) const {
-const auto column_len = column.size();
-pcolumn->mutable_binary()->resize(column_len * sizeof(FieldType));
-auto* data = pcolumn->mutable_binary()->data();
+int64_t DataTypeNumberBase::get_uncompressed_serialized_bytes(const 
IColumn& column) const {
+return sizeof(uint32_t) + column.size() * sizeof(FieldType);
+}
 
-// copy the data
+template 
+char* DataTypeNumberBase::serialize(const IColumn& column, char* buf) const 
{
+// column num
+const auto column_num = column.size();

Review comment:
   Why each column need to serialize column size,one block only need 
register block->rows() once

##
File path: be/src/vec/data_types/data_type_nullable.cpp
##
@@ -53,33 +53,52 @@ std::string DataTypeNullable::to_string(const IColumn& 
column, size_t row_num) c
 }
 }
 
-size_t DataTypeNullable::serialize(const IColumn& column, PColumn* pcolumn) 
const {
+// binary: column num |  | 
+//  : is_null1 | is_null2 | ...
+//  : value1 | value2 | ...>
+int64_t DataTypeNullable::get_uncompressed_serialized_bytes(const IColumn& 
column) const {
+int64_t size = sizeof(uint32_t);
+size += sizeof(bool) * column.size();
+size += 
nested_data_type->get_uncompressed_serialized_bytes(assert_cast(column).get_nested_column());
+return size;
+}
+
+char* DataTypeNullable::serialize(const IColumn& column, char* buf) const {
 auto ptr = column.convert_to_full_column_if_const();
 const ColumnNullable& col = assert_cast(*ptr.get());
-pcolumn->mutable_is_null()->Reserve(column.size());
 
+// column num
+*reinterpret_cast(buf) = column.size();
+buf += sizeof(uint32_t);
+// null flags
 for (size_t i = 0; i < column.size(); ++i) {
-bool is_null = col.is_null_at(i);
-pcolumn->add_is_null(is_null);
+*reinterpret_cast(buf) = col.is_null_at(i);

Review comment:
   TODO: maybe here should use memcpy to speed up




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---

[GitHub] [incubator-doris] yiguolei opened a new pull request #7940: [Feature] Add table function framework and add numbers table function

2022-02-03 Thread GitBox


yiguolei opened a new pull request #7940:
URL: https://github.com/apache/incubator-doris/pull/7940


   - Add table function framework, developers could define new table function 
under this framework.
   - Add a demo table function numbers. User could use select sum(number) from 
numbers("1"). And we could use this table function to test our vector 
engines performance.
   - Numbers table function is only supported under vectoried engine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org