[GitHub] [incubator-doris] xiaokang commented on pull request #8451: [improvement](memory) fix olap table scan and sink memory usage problem

2022-03-13 Thread GitBox


xiaokang commented on pull request #8451:
URL: https://github.com/apache/incubator-doris/pull/8451#issuecomment-1066054539


   @morningman volap_scan_node.cpp is done. The test result is almost the same 
as non-vectorized version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] zhannngchen opened a new pull request #8458: [UT] add unit tests for min/max function, and cleaned up some unused …

2022-03-13 Thread GitBox


zhannngchen opened a new pull request #8458:
URL: https://github.com/apache/incubator-doris/pull/8458


   # Proposed changes
   
   Add unit tests for min/max function, with some code cleanup.
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   2. Has unit tests been added: (Yes)
   3. Has document been added or modified: (No)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8458: [UT] add unit tests for min/max function, and cleaned up some unused …

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8458:
URL: https://github.com/apache/incubator-doris/pull/8458#issuecomment-1066080355


   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] dataroaring commented on issue #8382: [Bug] variance is different with trino

2022-03-13 Thread GitBox


dataroaring commented on issue #8382:
URL: 
https://github.com/apache/incubator-doris/issues/8382#issuecomment-1066080832


   
https://github.com/apache/incubator-doris/blob/master/regression-test/suites/aggregate/aggregate.groovy
 can reproduce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] zbtzbtzbt commented on issue #8435: [Enhancement] The bitmap_hash function can be implemented using murmur_hash3_128

2022-03-13 Thread GitBox


zbtzbtzbt commented on issue #8435:
URL: 
https://github.com/apache/incubator-doris/issues/8435#issuecomment-1066083773


   I think this modification will be incompatible with old data @syb853553110 
   
https://doris.apache.org/zh-CN/sql-reference/sql-functions/bitmap-functions/bitmap_hash.html#description


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8457: [fix][routine-load] fix bug that routine load cannot cancel task when append_data return error

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8457:
URL: https://github.com/apache/incubator-doris/pull/8457#issuecomment-1066084147






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8451: [improvement](memory) fix olap table scan and sink memory usage problem

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8451:
URL: https://github.com/apache/incubator-doris/pull/8451#issuecomment-1066084713






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #8369: [docs] Update documentation configuration parameter `sink.batch.bytes…

2022-03-13 Thread GitBox


morningman merged pull request #8369:
URL: https://github.com/apache/incubator-doris/pull/8369


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8369: [docs] Update documentation configuration parameter `sink.batch.bytes…

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8369:
URL: https://github.com/apache/incubator-doris/pull/8369#issuecomment-1066095170


   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: [doc] Update documentation configuration parameter `sink.batch.bytes` in flink-doris-connector (#8369)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 392a977  [doc] Update documentation configuration parameter 
`sink.batch.bytes` in flink-doris-connector (#8369)
392a977 is described below

commit 392a9774af584a230f041a756eca293a79b89460
Author: Jiangqiao Xu <96433131+bridgedr...@users.noreply.github.com>
AuthorDate: Sun Mar 13 20:53:50 2022 +0800

[doc] Update documentation configuration parameter `sink.batch.bytes` in 
flink-doris-connector (#8369)
---
 docs/en/extending-doris/flink-doris-connector.md| 2 +-
 docs/zh-CN/extending-doris/flink-doris-connector.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/en/extending-doris/flink-doris-connector.md 
b/docs/en/extending-doris/flink-doris-connector.md
index c52ea50..823cdbc 100644
--- a/docs/en/extending-doris/flink-doris-connector.md
+++ b/docs/en/extending-doris/flink-doris-connector.md
@@ -302,7 +302,7 @@ outputFormat.close();
 | sink.batch.interval | 10s   | The flush 
interval, after which the asynchronous thread will write the data in the cache 
to BE. The default value is 10 second, and the time units are ms, s, min, h, 
and d. Set to 0 to turn off periodic writing. |
 | sink.properties.* | --   | The stream load parameters. 
 eg: sink.properties.column_separator' = ','   Setting 
'sink.properties.escape_delimiters' = 'true' if you want to use a control char 
as a separator, so that such as '\\x01' will translate to binary 0x01  Support JSON format import, you need to enable both 
'sink.properties.format' ='json' and 'sink.properties.strip_outer_array' 
='true'|
 | sink.enable-delete | true   | Whether to enable deletion. 
This option requires Doris table to enable batch delete function (0.15+ version 
is enabled by default), and only supports Uniq model.|
-
+| sink.batch.bytes| 10485760  | Maximum bytes 
of batch in a single write to BE. When the data size in batch exceeds this 
threshold, cache data is written to BE. The default value is 10MB |
 
 ## Doris & Flink Column Type Mapping
 
diff --git a/docs/zh-CN/extending-doris/flink-doris-connector.md 
b/docs/zh-CN/extending-doris/flink-doris-connector.md
index 7549fcb..fd3aca7 100644
--- a/docs/zh-CN/extending-doris/flink-doris-connector.md
+++ b/docs/zh-CN/extending-doris/flink-doris-connector.md
@@ -306,7 +306,7 @@ outputFormat.close();
 | sink.batch.interval | 10s   | flush 间隔时间,超过该时间后异步线程将 
缓存中数据写入BE。 默认值为10秒,支持时间单位ms、s、min、h和d。设置为0表示关闭定期写入。 |
 | sink.properties.* | --   | Stream load 的导入参数例如:'sink.properties.column_separator' = ', '定义列分隔符'sink.properties.escape_delimiters' = 'true'特殊字符作为分隔符,'\\x01'会被转换为二进制的0x01 'sink.properties.format' = 
'json''sink.properties.strip_outer_array' = 'true' JSON格式导入|
 | sink.enable-delete | true   | 
是否启用删除。此选项需要Doris表开启批量删除功能(0.15+版本默认开启),只支持Uniq模型。|
-
+| sink.batch.bytes  | 10485760  | 单次写BE的最大数据量,当每个 batch 
中记录的数据量超过该阈值时,会将缓存数据写入 BE。默认值为 10MB|
 ## Doris 和 Flink 列类型映射关系
 
 | Doris Type | Flink Type   |

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: [improvement](VHashJoin) add probe timer (#8233)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 705989d  [improvement](VHashJoin) add probe timer (#8233)
705989d is described below

commit 705989d23916ce115b6ed269221f7be377b74a24
Author: awakeljw <993007...@qq.com>
AuthorDate: Sun Mar 13 20:54:44 2022 +0800

[improvement](VHashJoin) add probe timer (#8233)
---
 be/src/vec/exec/join/vhash_join_node.cpp | 217 ++-
 be/src/vec/exec/join/vhash_join_node.h   |   3 +
 2 files changed, 127 insertions(+), 93 deletions(-)

diff --git a/be/src/vec/exec/join/vhash_join_node.cpp 
b/be/src/vec/exec/join/vhash_join_node.cpp
index c33bcb2..a1af769 100644
--- a/be/src/vec/exec/join/vhash_join_node.cpp
+++ b/be/src/vec/exec/join/vhash_join_node.cpp
@@ -166,8 +166,56 @@ struct ProcessHashTableProbe {
   _items_counts(join_node->_items_counts),
   _build_block_offsets(join_node->_build_block_offsets),
   _build_block_rows(join_node->_build_block_rows),
-  _rows_returned_counter(join_node->_rows_returned_counter) {}
+  _rows_returned_counter(join_node->_rows_returned_counter),
+  _search_hashtable_timer(join_node->_search_hashtable_timer),
+  _build_side_output_timer(join_node->_build_side_output_timer),
+  _probe_side_output_timer(join_node->_probe_side_output_timer) {}
+
+// output build side result column
+void build_side_output_column(MutableColumns& mcol, int column_offset, int 
column_length, int size) {
+constexpr auto is_semi_anti_join = JoinOpType::value == 
TJoinOp::RIGHT_ANTI_JOIN ||
+   JoinOpType::value == 
TJoinOp::RIGHT_SEMI_JOIN ||
+   JoinOpType::value == 
TJoinOp::LEFT_ANTI_JOIN ||
+   JoinOpType::value == 
TJoinOp::LEFT_SEMI_JOIN;
 
+constexpr auto probe_all = JoinOpType::value == 
TJoinOp::LEFT_OUTER_JOIN ||
+   JoinOpType::value == 
TJoinOp::FULL_OUTER_JOIN;
+
+if constexpr (!is_semi_anti_join) {
+if (_build_blocks.size() == 1) {
+for (int i = 0; i < column_length; i++) {
+auto& column = *_build_blocks[0].get_by_position(i).column;
+mcol[i + column_offset]->insert_indices_from(column,
+_build_block_rows.data(), _build_block_rows.data() 
+ size);
+}
+} else {
+for (int i = 0; i < column_length; i++) {
+for (int j = 0; j < size; j++) {
+if constexpr (probe_all) {
+if (_build_block_offsets[j] == -1) {
+DCHECK(mcol[i + column_offset]->is_nullable());
+assert_cast(mcol[i + 
column_offset].get())->insert_join_null_data();
+} else {
+auto& column = 
*_build_blocks[_build_block_offsets[j]].get_by_position(i).column;
+mcol[i + column_offset]->insert_from(column, 
_build_block_rows[j]);
+}
+} else {
+auto& column = 
*_build_blocks[_build_block_offsets[j]].get_by_position(i).column;
+mcol[i + column_offset]->insert_from(column, 
_build_block_rows[j]);
+}
+}
+}
+}
+}
+}
+
+// output probe side result column
+void probe_side_output_column(MutableColumns& mcol, int column_length, int 
size) {
+for (int i = 0; i < column_length; ++i) {
+auto& column = _probe_block.get_by_position(i).column;
+column->replicate(&_items_counts[0], size, *mcol[i]);
+}
+}
 // Only process the join with no other join conjunt, because of no other 
join conjunt
 // the output block struct is same with mutable block. we can do more opt 
on it and simplify
 // the logic of probe
@@ -198,116 +246,93 @@ struct ProcessHashTableProbe {
 constexpr auto is_right_semi_anti_join = JoinOpType::value == 
TJoinOp::RIGHT_ANTI_JOIN ||
 JoinOpType::value == 
TJoinOp::RIGHT_SEMI_JOIN;
 
-constexpr auto is_semi_anti_join = is_right_semi_anti_join ||
-JoinOpType::value == 
TJoinOp::LEFT_ANTI_JOIN ||
-JoinOpType::value == 
TJoinOp::LEFT_SEMI_JOIN;
-
 constexpr auto probe_all = JoinOpType::value == 
TJoinOp::LEFT_OUTER_JOIN ||
  JoinOpType::value == 
TJoinOp::FULL_OUTER_JOIN;
 
-for (; _probe_index < _p

[GitHub] [incubator-doris] morningman merged pull request #8233: [Vectorized][HashJoin] add probe timer

2022-03-13 Thread GitBox


morningman merged pull request #8233:
URL: https://github.com/apache/incubator-doris/pull/8233


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new issue #8459: [Bug] BE crash when doing left outer join with vec engine

2022-03-13 Thread GitBox


morningman opened a new issue #8459:
URL: https://github.com/apache/incubator-doris/issues/8459


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   dev-1.0.0
   
   ### What's Wrong?
   
   Works```
   *** Aborted at 1647150517 (unix time) try "date -d @1647150517" if you are 
using GNU date ***
   PC: @ 0x7fb284945a9c doris::vectorized::ColumnVector<>::insert_from()
   *** SIGSEGV (@0x0) received by PID 747 (TID 0x7fb2571bd700) from PID 0; 
stack trace: ***
   @ 0x7fb285fce812 google::(anonymous 
namespace)::FailureSignalHandler()
   @ 0x7fb281d0a920 (unknown)
   @ 0x7fb284945a9c doris::vectorized::ColumnVector<>::insert_from()
   @ 0x7fb284928ea7 doris::vectorized::ColumnNullable::insert_from()
   @ 0x7fb285b60e10 
doris::vectorized::BlockReader::_agg_key_next_block()
   @ 0x7fb284cfe21d doris::vectorized::VOlapScanner::get_block()
   @ 0x7fb284cf3d62 doris::vectorized::VOlapScanNode::scanner_thread()
   @ 0x7fb28435344a doris::PriorityWorkStealingThreadPool::work_thread()
   @ 0x7fb2881077b0 execute_native_thread_routine
   @ 0x7fb281ac2851 start_thread
   @ 0x7fb281dbf67d clone
   @0x0 (unknown)
   ```
   
   ### What You Expected?
   
   Works well
   
   ### How to Reproduce?
   
   Probably because the right table is an agg table , and the column is NOT 
NULL.
   But the right table's output slot should be nullable.
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8458: [UT] add unit tests for min/max function, and cleaned up some unused …

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8458:
URL: https://github.com/apache/incubator-doris/pull/8458#issuecomment-1066110440


   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8456: [chore](dependency) fix build thirdparty errors

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8456:
URL: https://github.com/apache/incubator-doris/pull/8456#issuecomment-1066110771


   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #8456: [chore](dependency) fix build thirdparty errors

2022-03-13 Thread GitBox


morningman merged pull request #8456:
URL: https://github.com/apache/incubator-doris/pull/8456


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated (705989d -> a4b710c)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


from 705989d  [improvement](VHashJoin) add probe timer (#8233)
 add a4b710c  [chore](dependency) fix build thirdparty errors (#8456)

No new revisions were added by this update.

Summary of changes:
 docs/.vuepress/sidebar/en.js   |  3 +-
 docs/.vuepress/sidebar/zh-CN.js|  3 +-
 .../sql-functions/bitwise-functions/bit_length.md  | 55 --
 .../sql-functions/bitwise-functions/bit_length.md  | 55 --
 .../doris/load/routineload/ScheduleRule.java   | 13 +
 thirdparty/download-thirdparty.sh  | 15 +-
 6 files changed, 17 insertions(+), 127 deletions(-)
 delete mode 100644 
docs/en/sql-reference/sql-functions/bitwise-functions/bit_length.md
 delete mode 100644 
docs/zh-CN/sql-reference/sql-functions/bitwise-functions/bit_length.md

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #8451: [improvement](memory) fix olap table scan and sink memory usage problem

2022-03-13 Thread GitBox


morningman merged pull request #8451:
URL: https://github.com/apache/incubator-doris/pull/8451


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated (a4b710c -> e807e8b)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


from a4b710c  [chore](dependency) fix build thirdparty errors (#8456)
 add e807e8b  [improvement](memory) fix olap table scan and sink memory 
usage problem (#8451)

No new revisions were added by this update.

Summary of changes:
 be/src/common/config.h  |  4 +++-
 be/src/exec/olap_scan_node.cpp  | 24 +++
 be/src/exec/olap_scan_node.h|  6 +
 be/src/exec/olap_scanner.cpp| 10 
 be/src/exec/tablet_sink.cpp | 15 ++--
 be/src/exec/tablet_sink.h   |  3 +++
 be/src/vec/exec/volap_scan_node.cpp | 48 +++--
 be/src/vec/exec/volap_scanner.cpp   |  8 ++-
 8 files changed, 98 insertions(+), 20 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] dataroaring closed pull request #8433: add loggger to Suite to log in cases

2022-03-13 Thread GitBox


dataroaring closed pull request #8433:
URL: https://github.com/apache/incubator-doris/pull/8433


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] dataroaring opened a new pull request #8460: let framework support sql cases

2022-03-13 Thread GitBox


dataroaring opened a new pull request #8460:
URL: https://github.com/apache/incubator-doris/pull/8460


   We generate groovy files from sql cases and run the generated groovy
   file. This way, we can just put sql cases, then framework handles
   left work.
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch dev-1.0.0 updated (32da525 -> 701fd4f)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch dev-1.0.0
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


from 32da525  [fix] BE crash when reporting tablet (#8453)
 new eb322f5  [improvement](vectorized) Support BetweenPredicate enable 
fold const expr (#8450)
 new ca05846  [improvement](memory) fix olap table scan and sink memory 
usage problem (#8451)
 new 701fd4f  [chore](dependency) fix build thirdparty errors (#8456)

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/common/config.h |  4 +-
 be/src/exec/olap_scan_node.cpp | 22 +++--
 be/src/exec/olap_scan_node.h   |  6 +++
 be/src/exec/olap_scanner.cpp   | 10 ++--
 be/src/exec/tablet_sink.cpp|  9 +++-
 be/src/exec/tablet_sink.h  |  3 ++
 be/src/runtime/mysql_result_writer.cpp |  6 ++-
 be/src/vec/columns/column.h|  4 +-
 be/src/vec/columns/column_nullable.cpp |  2 +-
 be/src/vec/columns/column_nullable.h   |  2 +-
 be/src/vec/columns/column_vector.cpp   |  4 +-
 be/src/vec/exec/volap_scan_node.cpp| 51 
 be/src/vec/exec/volap_scanner.cpp  |  8 +++-
 be/src/vec/exprs/vtuple_is_null_predicate.cpp  |  6 +--
 docs/.vuepress/sidebar/en.js   |  3 +-
 docs/.vuepress/sidebar/zh-CN.js|  3 +-
 .../sql-functions/bitwise-functions/bit_length.md  | 55 --
 .../sql-functions/bitwise-functions/bit_length.md  | 55 --
 .../doris/load/routineload/ScheduleRule.java   | 13 +
 .../apache/doris/rewrite/FoldConstantsRule.java|  6 ++-
 thirdparty/download-thirdparty.sh  | 15 +-
 21 files changed, 127 insertions(+), 160 deletions(-)
 delete mode 100644 
docs/en/sql-reference/sql-functions/bitwise-functions/bit_length.md
 delete mode 100644 
docs/zh-CN/sql-reference/sql-functions/bitwise-functions/bit_length.md

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] 01/03: [improvement](vectorized) Support BetweenPredicate enable fold const expr (#8450)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.0
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit eb322f542cb37fa49d264b887bfe45fe7499046e
Author: HappenLee 
AuthorDate: Sun Mar 13 09:36:24 2022 +0800

[improvement](vectorized) Support BetweenPredicate enable fold const expr 
(#8450)
---
 be/src/runtime/mysql_result_writer.cpp  | 6 --
 be/src/vec/columns/column.h | 4 +++-
 be/src/vec/columns/column_nullable.cpp  | 2 +-
 be/src/vec/columns/column_nullable.h| 2 +-
 be/src/vec/columns/column_vector.cpp| 4 ++--
 be/src/vec/exprs/vtuple_is_null_predicate.cpp   | 6 ++
 .../src/main/java/org/apache/doris/rewrite/FoldConstantsRule.java   | 6 +-
 7 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/be/src/runtime/mysql_result_writer.cpp 
b/be/src/runtime/mysql_result_writer.cpp
index eaf1bd7..2a7de6c 100644
--- a/be/src/runtime/mysql_result_writer.cpp
+++ b/be/src/runtime/mysql_result_writer.cpp
@@ -159,8 +159,10 @@ int MysqlResultWriter::_add_row_value(int index, const 
TypeDescriptor& type, voi
 
 case TYPE_DECIMALV2: {
 DecimalV2Value decimal_val(reinterpret_cast(item)->value);
-int output_scale = _output_expr_ctxs[index]->root()->output_scale();
-buf_ret = _row_buffer->push_decimal(decimal_val, output_scale);
+// TODO: Support decimal output_scale after we support FE can sure
+// accuracy of output_scale
+// int output_scale = _output_expr_ctxs[index]->root()->output_scale();
+buf_ret = _row_buffer->push_decimal(decimal_val, -1);
 break;
 }
 
diff --git a/be/src/vec/columns/column.h b/be/src/vec/columns/column.h
index fdfd85b..07989f4 100644
--- a/be/src/vec/columns/column.h
+++ b/be/src/vec/columns/column.h
@@ -34,6 +34,8 @@ namespace doris::vectorized {
 
 class Arena;
 class Field;
+// TODO: Remove the trickly hint, after FE support better way to remove 
function tuple_is_null
+constexpr uint8_t JOIN_NULL_HINT = 2;
 
 /// Declares interface to store columns in memory.
 class IColumn : public COW {
@@ -164,7 +166,7 @@ public:
 /// indices_begin + indices_end represent the row indices of column src
 /// Warning:
 ///   if *indices == -1 means the row is null, only use in outer join, 
do not use in any other place
-///   insert -1 in null map to hint the null is produced by outer join
+///   insert JOIN_NULL_HINT in null map to hint the null is produced 
by outer join
 virtual void insert_indices_from(const IColumn& src, const int* 
indices_begin, const int* indices_end) = 0;
 
 /// Appends data located in specified memory chunk if it is possible 
(throws an exception if it cannot be implemented).
diff --git a/be/src/vec/columns/column_nullable.cpp 
b/be/src/vec/columns/column_nullable.cpp
index 9877903..69634ef 100644
--- a/be/src/vec/columns/column_nullable.cpp
+++ b/be/src/vec/columns/column_nullable.cpp
@@ -114,7 +114,7 @@ StringRef ColumnNullable::serialize_value_into_arena(size_t 
n, Arena& arena,
 
 void ColumnNullable::insert_join_null_data() {
 get_nested_column().insert_default();
-get_null_map_data().push_back(-1);
+get_null_map_data().push_back(JOIN_NULL_HINT);
 }
 
 const char* ColumnNullable::deserialize_and_insert_from_arena(const char* pos) 
{
diff --git a/be/src/vec/columns/column_nullable.h 
b/be/src/vec/columns/column_nullable.h
index 030ca13..1a792f7 100644
--- a/be/src/vec/columns/column_nullable.h
+++ b/be/src/vec/columns/column_nullable.h
@@ -80,7 +80,7 @@ public:
 
 /// Will insert null value if pos=nullptr
 void insert_data(const char* pos, size_t length) override;
-/// -1 in null map means null is generated by join, only use in tuple is 
null
+/// JOIN_NULL_HINT in null map means null is generated by join, only use 
in tuple is null
 void insert_join_null_data();
 
 StringRef serialize_value_into_arena(size_t n, Arena& arena, char const*& 
begin) const override;
diff --git a/be/src/vec/columns/column_vector.cpp 
b/be/src/vec/columns/column_vector.cpp
index 3188a93..dfe1bce 100644
--- a/be/src/vec/columns/column_vector.cpp
+++ b/be/src/vec/columns/column_vector.cpp
@@ -231,8 +231,8 @@ void ColumnVector::insert_indices_from(const IColumn& 
src, const int* indices
 // Now Uint8 use to identify null and non null
 // 1. nullable column : offset == -1 means is null at the here, 
set true here
 // 2. real data column : offset == -1 what at is meaningless
-// 3. -1 only use in outer join to hint the null is produced by 
outer join
-data[origin_size + i] = (offset == -1) ? UInt8(-1) : 
src_vec.get_element(offset);
+// 3. JOIN_NULL_HINT only use in outer join to

[incubator-doris] 02/03: [improvement](memory) fix olap table scan and sink memory usage problem (#8451)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.0
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit ca058465f861f1df95e25d1a3b009e71c5bdf2ea
Author: Kang 
AuthorDate: Sun Mar 13 22:12:15 2022 +0800

[improvement](memory) fix olap table scan and sink memory usage problem 
(#8451)

Due to unlimited queue in OlapScanNode and NodeChannel, memory usage can be
very large for reading and writing large table, e.g 'insert into tableB 
select * from tableA'.
---
 be/src/common/config.h  |  4 ++-
 be/src/exec/olap_scan_node.cpp  | 22 +---
 be/src/exec/olap_scan_node.h|  6 +
 be/src/exec/olap_scanner.cpp| 10 
 be/src/exec/tablet_sink.cpp |  9 +--
 be/src/exec/tablet_sink.h   |  3 +++
 be/src/vec/exec/volap_scan_node.cpp | 51 +++--
 be/src/vec/exec/volap_scanner.cpp   |  8 +-
 8 files changed, 92 insertions(+), 21 deletions(-)

diff --git a/be/src/common/config.h b/be/src/common/config.h
index 8f3b0b7..26bd081 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -167,8 +167,10 @@ CONF_mInt64(thrift_client_retry_interval_ms, "1000");
 CONF_mInt32(doris_scan_range_row_count, "524288");
 // size of scanner queue between scanner thread and compute thread
 CONF_mInt32(doris_scanner_queue_size, "1024");
-// single read execute fragment row size
+// single read execute fragment row number
 CONF_mInt32(doris_scanner_row_num, "16384");
+// single read execute fragment row bytes
+CONF_mInt32(doris_scanner_row_bytes, "10485760");
 // number of max scan keys
 CONF_mInt32(doris_max_scan_key_num, "1024");
 // the max number of push down values of a single column.
diff --git a/be/src/exec/olap_scan_node.cpp b/be/src/exec/olap_scan_node.cpp
index af26ae1..19ec140 100644
--- a/be/src/exec/olap_scan_node.cpp
+++ b/be/src/exec/olap_scan_node.cpp
@@ -77,6 +77,8 @@ Status OlapScanNode::init(const TPlanNode& tnode, 
RuntimeState* state) {
 _max_pushdown_conditions_per_column = 
config::max_pushdown_conditions_per_column;
 }
 
+_max_scanner_queue_size_bytes = query_options.mem_limit / 20; //TODO: 
session variable percent
+
 /// TODO: could one filter used in the different scan_node ?
 int filter_size = _runtime_filter_descs.size();
 _runtime_filter_ctxs.resize(filter_size);
@@ -306,6 +308,7 @@ Status OlapScanNode::get_next(RuntimeState* state, 
RowBatch* row_batch, bool* eo
 materialized_batch = _materialized_row_batches.front();
 DCHECK(materialized_batch != nullptr);
 _materialized_row_batches.pop_front();
+_materialized_row_batches_bytes -= 
materialized_batch->tuple_data_pool()->total_reserved_bytes();
 }
 }
 
@@ -394,12 +397,14 @@ Status OlapScanNode::close(RuntimeState* state) {
 }
 
 _materialized_row_batches.clear();
+_materialized_row_batches_bytes = 0;
 
 for (auto row_batch : _scan_row_batches) {
 delete row_batch;
 }
 
 _scan_row_batches.clear();
+_scan_row_batches_bytes = 0;
 
 // OlapScanNode terminate by exception
 // so that initiative close the Scanner
@@ -1371,6 +1376,7 @@ void OlapScanNode::transfer_thread(RuntimeState* state) {
 int max_thread = _max_materialized_row_batches;
 if (config::doris_scanner_row_num > state->batch_size()) {
 max_thread /= config::doris_scanner_row_num / state->batch_size();
+if (max_thread <= 0) max_thread = 1;
 }
 // read from scanner
 while (LIKELY(status.ok())) {
@@ -1393,7 +1399,7 @@ void OlapScanNode::transfer_thread(RuntimeState* state) {
 if (state->fragment_mem_tracker() != nullptr) {
 mem_consume = state->fragment_mem_tracker()->consumption();
 }
-if (mem_consume < (mem_limit * 6) / 10) {
+if (mem_consume < (mem_limit * 6) / 10 && _scan_row_batches_bytes 
< _max_scanner_queue_size_bytes / 2) {
 thread_slot_num = max_thread - assigned_thread_num;
 } else {
 // Memory already exceed
@@ -1473,6 +1479,7 @@ void OlapScanNode::transfer_thread(RuntimeState* state) {
 if (LIKELY(!_scan_row_batches.empty())) {
 scan_batch = _scan_row_batches.front();
 _scan_row_batches.pop_front();
+_scan_row_batches_bytes -= 
scan_batch->tuple_data_pool()->total_reserved_bytes();
 
 // delete scan_batch if transfer thread should be stopped
 // because scan_batch wouldn't be useful anymore
@@ -1573,10 +1580,12 @@ void OlapScanNode::scanner_thread(OlapScanner* scanner) 
{
 // need yield this thread when we do enough work. However, OlapStorage read
 // data in pre-aggregate mode, then we can't use storage returned data to
 // judge if we need to yield. So we record all raw data read in this round
-// sc

[incubator-doris] 03/03: [chore](dependency) fix build thirdparty errors (#8456)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.0
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit 701fd4f7f5bbf422f817be6a917e3ca19f294ae0
Author: Mingyu Chen 
AuthorDate: Sun Mar 13 22:11:24 2022 +0800

[chore](dependency) fix build thirdparty errors (#8456)

1. the patch for aws-c-cal-0.4.5 does not need anymore
2. remove duplicate bit_length document
3. add some debug log for routine load
---
 docs/.vuepress/sidebar/en.js   |  3 +-
 docs/.vuepress/sidebar/zh-CN.js|  3 +-
 .../sql-functions/bitwise-functions/bit_length.md  | 55 --
 .../sql-functions/bitwise-functions/bit_length.md  | 55 --
 .../doris/load/routineload/ScheduleRule.java   | 13 +
 thirdparty/download-thirdparty.sh  | 15 +-
 6 files changed, 17 insertions(+), 127 deletions(-)

diff --git a/docs/.vuepress/sidebar/en.js b/docs/.vuepress/sidebar/en.js
index 941be11..b24039d 100644
--- a/docs/.vuepress/sidebar/en.js
+++ b/docs/.vuepress/sidebar/en.js
@@ -449,8 +449,7 @@ module.exports = [
   "bitand",
   "bitor",
   "bitxor",
-  "bitnot",
-  "bit_length"
+  "bitnot"
 ],
   },
   {
diff --git a/docs/.vuepress/sidebar/zh-CN.js b/docs/.vuepress/sidebar/zh-CN.js
index 582407b..f05b3b8 100644
--- a/docs/.vuepress/sidebar/zh-CN.js
+++ b/docs/.vuepress/sidebar/zh-CN.js
@@ -453,8 +453,7 @@ module.exports = [
   "bitand",
   "bitor",
   "bitxor",
-  "bitnot",
-  "bit_length"
+  "bitnot"
 ],
   },
   {
diff --git 
a/docs/en/sql-reference/sql-functions/bitwise-functions/bit_length.md 
b/docs/en/sql-reference/sql-functions/bitwise-functions/bit_length.md
deleted file mode 100644
index 9f56a1f..000
--- a/docs/en/sql-reference/sql-functions/bitwise-functions/bit_length.md
+++ /dev/null
@@ -1,55 +0,0 @@

-{
-"title": "bit_length",
-"language": "en"
-}

-
-
-
-# bit_length
-## description
-### Syntax
-
-`INT bit_length(VARCHAR str)`
-
-Return length of argument in bits.
-
-## example
-
-```
-MySQL> select bit_length("doris");
-+-+
-| bit_length('doris') |
-+-+
-|  40 |
-+-+
-
-MySQL [test]> select bit_length("hello world");
-+---+
-| bit_length('hello world') |
-+---+
-|88 |
-+---+
-```
-
-## keyword
-
-bit_length
diff --git 
a/docs/zh-CN/sql-reference/sql-functions/bitwise-functions/bit_length.md 
b/docs/zh-CN/sql-reference/sql-functions/bitwise-functions/bit_length.md
deleted file mode 100644
index c0005fa..000
--- a/docs/zh-CN/sql-reference/sql-functions/bitwise-functions/bit_length.md
+++ /dev/null
@@ -1,55 +0,0 @@

-{
-"title": "bit_length",
-"language": "zh-CN"
-}

-
-
-
-# bit_length
-## description
-### Syntax
-
-`INT bit_length(VARCHAR str)`
-
-返回字符串的bit位数
-
-## example
-
-```
-MySQL> select bit_length("doris");
-+-+
-| bit_length('doris') |
-+-+
-|  40 |
-+-+
-
-MySQL [test]> select bit_length("hello world");
-+---+
-| bit_length('hello world') |
-+---+
-|88 |
-+---+
-```
-
-## keyword
-
-bit_length
diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java 
b/fe/fe-core/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
index eaa52e2..c72ee6c 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
@@ -21,10 +21,14 @@ import org.apache.doris.common.Config;
 import org.apache.doris.common.InternalErrorCode;
 import org.apache.doris.system.SystemInfoService;
 
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
 /**
  * ScheduleRule: RoutineLoad PAUSED -> NEED_SCHEDULE
  */
 public class ScheduleRule {
+private static final Logger LOG = LogManager.getLogger(ScheduleRule.class);
 
 private static int deadBeCount(String clusterName) {
 SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
@@ -43,17 +47,26 @@ public class ScheduleRule {
 return false;
 }
 if (jobRoutine.autoResumeLock) {//only manual resume for unlock
+LOG.debug("routine load job {}'s autoResumeLock is true, skip", 
jobRoutine.id);
 return false;
 }
 
 /*
  * Handle all backends are down.
  */
+LOG.debug("try to auto reschedule routine load {}, 
firstResumeTimestamp: {}, autoResumeCoun

[incubator-doris] annotated tag 1.0.0-preview updated (e6478e8 -> 014414a)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to annotated tag 1.0.0-preview
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


*** WARNING: tag 1.0.0-preview was modified! ***

from e6478e8  (commit)
  to 014414a  (tag)
 tagging e6478e8229430d3ce7bce5282fc233b9511c303a (commit)
  by morningman
  on Sun Mar 13 22:42:39 2022 +0800

- Log -
1.0.0-preview
---


No new revisions were added by this update.

Summary of changes:

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] HappenLee opened a new pull request #8461: [Bug][Vectorized] Agg/Unique not null column outer join coredump

2022-03-13 Thread GitBox


HappenLee opened a new pull request #8461:
URL: https://github.com/apache/incubator-doris/pull/8461


   # Proposed changes
   
   Issue Number: close #8459
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   2. Has unit tests been added: (No Need)
   3. Has document been added or modified: (No Need)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (Yes)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] xiaokang commented on pull request #8322: [refactor] Impl of MemTracker, and related use

2022-03-13 Thread GitBox


xiaokang commented on pull request #8322:
URL: https://github.com/apache/incubator-doris/pull/8322#issuecomment-1066228733


   @xinyiZzz When I test for #8451 , I encounter a memory limit problem. 
   
   The problem is that, after the long query, as specified in the test steps of 
#8451 , is finished, a simple query 'select count() from tableA' will raise 
memory limit error.
   
   I guess it's related to this pr, since the problem is not present before I 
merge the new MemTracker code.
   
   The following is mysql client error message.
   > ERROR 1105 (HY000): errCode = 2, detailMessage = Memory exceed limit. 
fragment=4f5f114582e7429d-a630eec6e0e45384, details=New partitioned 
aggregation, while getting next from child 0., on 
backend=[172.16.44.107](http://172.16.44.107/). Memory left in process 
limit=8589934592.00 GB. current tracker  I0312 10:58:07.931836  2320 plan_fragment_executor.cpp:76] 
PlanFragmentExecutor::prepare|pthread_id=140354754955008|backend_num=1|instance_id=35895a325c6943dc
   -872eced6dfcb8c91|query_id=35895a325c6943dc-872eced6dfcb8c90
   I0312 10:58:07.936765  2203 fragment_mgr.cpp:459] 
PlanFragmentExecutor::_exec_actual|pthread_id=140355728508672|instance_id=35895a325c6943dc-872eced6dfcb8c91|
   query_id=35895a325c6943dc-872eced6dfcb8c90
   I0312 10:58:07.936780  2203 plan_fragment_executor.cpp:213] 
PlanFragmentExecutor::open, using query memory limit: 7.59 
GB|mem_limit=8147483648|instance_id=358
   95a325c6943dc-872eced6dfcb8c91|query_id=35895a325c6943dc-872eced6dfcb8c90
   W0312 10:58:07.936826  2203 status.h:260] warning: Status msg truncated, OK: 
Memory exceed limit. fragment=35895a325c6943dc-872eced6dfcb8c91, details=New 
part
   itioned aggregation, while getting next from child 0., on 
backend=[172.16.44.107](http://172.16.44.107/). Memory left in process 
limit=8589934592.00 GB. current tracker . If query, can change the limit by 
session variable
exec_mem_limit. precise_code:1
   W0312 10:58:07.943392  2203 mem_tracker.cpp:290] Memory exceed limit. 
fragment=35895a325c6943dc-872eced6dfcb8c91, details=New partitioned 
aggregation, while g
   etting next from child 0., on 
backend=[172.16.44.107](http://172.16.44.107/). Memory left in process 
limit=8589934592.00 GB. current tracker 
. If query, can change the limit by session variable exec_mem_limit.
   MemTracker log_usage Label: queryId=35895a325c6943dc-872eced6dfcb8c90, 
Limit: 7.59 GB, Total: 19.00 KB, Peak: 19.00 KB, Exceeded: false
   MemTracker log_usage Label: 
RuntimeState:instance:35895a325c6943dc-872eced6dfcb8c92, Limit: 7.59 GB, Total: 
1.00 KB, Peak: 1.00 KB, Exceeded: false
   MemTracker log_usage Label: RuntimeFilterMgr, Limit: -1.00 B, Total: 0, 
Peak: 0, Exceeded: false
   MemTracker log_usage Label: 
RuntimeState:instance:35895a325c6943dc-872eced6dfcb8c91, Limit: 7.59 GB, Total: 
18.00 KB, Peak: 18.00 KB, Exceeded: false
   MemTracker log_usage Label: RuntimeFilterMgr, Limit: -1.00 B, Total: 0, 
Peak: 0, Exceeded: false
   MemTracker log_usage Label: ExecNode:AGGREGATION_NODE (id=1), Limit: -1.00 
B, Total: 1.00 KB, Peak: 1.00 KB, Exceeded: false
   MemTracker log_usage Label: 
DataStreamSender:35895a325c6943dc-872eced6dfcb8c91, Limit: -1.00 B, Total: 
16.00 KB, Peak: 16.00 KB, Exceeded: false
   W0312 10:58:07.944548  2203 fragment_mgr.cpp:231] Got error while opening 
fragment 35895a325c6943dc-872eced6dfcb8c91: Memory limit exceeded: Memory 
exceed lim
   it. fragment=35895a325c6943dc-872eced6dfcb8c91, details=New partitioned 
aggregation, while getting next from child 0., on 
backend=[172.16.44.107](http://172.16.44.107/). Memory left i
   n process limit=8589934592.00 GB. current tracker 

[GitHub] [incubator-doris-flink-connector] zhqu1148980644 opened a new pull request #19: Update README.md

2022-03-13 Thread GitBox


zhqu1148980644 opened a new pull request #19:
URL: https://github.com/apache/incubator-doris-flink-connector/pull/19


   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] yangzhg commented on a change in pull request #8439: [refactor] use c++ 14 deprecated instead of comment, this detect usage of deprecated var or func at compile time

2022-03-13 Thread GitBox


yangzhg commented on a change in pull request #8439:
URL: https://github.com/apache/incubator-doris/pull/8439#discussion_r825546528



##
File path: be/src/agent/task_worker_pool.h
##
@@ -51,12 +51,12 @@ class TaskWorkerPool {
 REALTIME_PUSH,
 PUBLISH_VERSION,
 // Deprecated
-CLEAR_ALTER_TASK,
+CLEAR_ALTER_TASK [[deprecated]],

Review comment:
   remove directly may change the enum value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] anjia0532 commented on issue #7587: [Roadmap] Doris on K8S

2022-03-13 Thread GitBox


anjia0532 commented on issue #7587:
URL: 
https://github.com/apache/incubator-doris/issues/7587#issuecomment-1066263940


   @liangyongz 
   [Kubernetes应用Pod固定IP之kruise](https://segmentfault.com/a/119040707667)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] caiconghui merged pull request #8457: [fix][routine-load] fix bug that routine load cannot cancel task when append_data return error

2022-03-13 Thread GitBox


caiconghui merged pull request #8457:
URL: https://github.com/apache/incubator-doris/pull/8457


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated: [fix][routine-load] fix bug that routine load cannot cancel task when append_data return error (#8457)

2022-03-13 Thread caiconghui
This is an automated email from the ASF dual-hosted git repository.

caiconghui pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 991dc7f  [fix][routine-load] fix bug that routine load cannot cancel 
task when append_data return error (#8457)
991dc7f is described below

commit 991dc7fc5cf53e359ea907d2c9d88f2916499a93
Author: caiconghui <55968745+caicong...@users.noreply.github.com>
AuthorDate: Mon Mar 14 10:18:14 2022 +0800

[fix][routine-load] fix bug that routine load cannot cancel task when 
append_data return error (#8457)
---
 be/src/runtime/routine_load/data_consumer_group.cpp | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/be/src/runtime/routine_load/data_consumer_group.cpp 
b/be/src/runtime/routine_load/data_consumer_group.cpp
index 5f6c789..7242fbe 100644
--- a/be/src/runtime/routine_load/data_consumer_group.cpp
+++ b/be/src/runtime/routine_load/data_consumer_group.cpp
@@ -116,7 +116,6 @@ Status KafkaDataConsumerGroup::start_all(StreamLoadContext* 
ctx) {
 
 MonotonicStopWatch watch;
 watch.start();
-Status st;
 bool eos = false;
 while (true) {
 if (eos || left_time <= 0 || left_rows <= 0 || left_bytes <= 0) {
@@ -140,12 +139,10 @@ Status 
KafkaDataConsumerGroup::start_all(StreamLoadContext* ctx) {
 // waiting all threads finished
 _thread_pool.shutdown();
 _thread_pool.join();
-
 if (!result_st.ok()) {
-// some of consumers encounter errors, cancel this task
+kafka_pipe->cancel(result_st.get_error_msg());
 return result_st;
 }
-
 kafka_pipe->finish();
 ctx->kafka_info->cmt_offset = std::move(cmt_offset);
 ctx->receive_bytes = ctx->max_batch_size - left_bytes;
@@ -159,9 +156,8 @@ Status KafkaDataConsumerGroup::start_all(StreamLoadContext* 
ctx) {
 << ", partition: " << msg->partition() << ", offset: " 
<< msg->offset()
 << ", len: " << msg->len();
 
-(kafka_pipe.get()->*append_data)(static_cast(msg->payload()),
+Status st = (kafka_pipe.get()->*append_data)(static_cast(msg->payload()),
  static_cast(msg->len()));
-
 if (st.ok()) {
 left_rows--;
 left_bytes -= msg->len();
@@ -172,6 +168,12 @@ Status 
KafkaDataConsumerGroup::start_all(StreamLoadContext* ctx) {
 // failed to append this msg, we must stop
 LOG(WARNING) << "failed to append msg to pipe. grp: " << 
_grp_id;
 eos = true;
+{
+std::unique_lock lock(_mutex);
+if (result_st.ok()) {
+result_st = st;
+}
+}
 }
 delete msg;
 } else {

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris-flink-connector] bridgeDream commented on pull request #18: [improvement] (before 1.13)Support set max bytes in each batch to avoid congestion

2022-03-13 Thread GitBox


bridgeDream commented on pull request #18:
URL: 
https://github.com/apache/incubator-doris-flink-connector/pull/18#issuecomment-1066275553


   > I will close #13 @bridgeDream 
   Ok, can you approval this pr for  branch before 1.13 @hf200012 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] hf200012 commented on issue #7587: [Roadmap] Doris on K8S

2022-03-13 Thread GitBox


hf200012 commented on issue #7587:
URL: 
https://github.com/apache/incubator-doris/issues/7587#issuecomment-1066280183


   > https://github.com/liangyongz/doris-on-k8s The temporary solution I am 
currently using is hostNetwork,This approach is limited.
   > 
   > Will you solve this problem,Use Pod+SVC instead of hostNetwork
   
   I am also doing research in this area, we can communicate together, my 
WeChat: 35926237, let's pull a group to communicate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8461: [fix](vectorized) Agg/Unique not null column outer join coredump

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8461:
URL: https://github.com/apache/incubator-doris/pull/8461#issuecomment-1066284455






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on pull request #8461: [fix](vectorized) Agg/Unique not null column outer join coredump

2022-03-13 Thread GitBox


morningman commented on pull request #8461:
URL: https://github.com/apache/incubator-doris/pull/8461#issuecomment-1066286993


   merge it for quick test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #8461: [fix](vectorized) Agg/Unique not null column outer join coredump

2022-03-13 Thread GitBox


morningman merged pull request #8461:
URL: https://github.com/apache/incubator-doris/pull/8461


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch master updated (991dc7f -> 41a15cc)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


from 991dc7f  [fix][routine-load] fix bug that routine load cannot cancel 
task when append_data return error (#8457)
 add 41a15cc  [fix](vectorized) Agg/Unique not null column outer join 
coredump (#8461)

No new revisions were added by this update.

Summary of changes:
 be/src/exec/olap_scanner.cpp  | 4 
 be/src/exec/olap_scanner.h| 1 +
 be/src/olap/reader.cpp| 2 ++
 be/src/olap/reader.h  | 4 
 be/src/olap/tablet_schema.cpp | 7 +--
 be/src/olap/tablet_schema.h   | 3 ++-
 be/src/vec/olap/vcollect_iterator.cpp | 2 +-
 7 files changed, 19 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman closed issue #8459: [Bug] BE crash when doing left outer join with vec engine

2022-03-13 Thread GitBox


morningman closed issue #8459:
URL: https://github.com/apache/incubator-doris/issues/8459


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #8448: [Feature][Vectorized] support lateral view

2022-03-13 Thread GitBox


BiteThet commented on a change in pull request #8448:
URL: https://github.com/apache/incubator-doris/pull/8448#discussion_r825562764



##
File path: be/src/vec/exec/vrepeat_node.cpp
##
@@ -181,13 +187,9 @@ Status VRepeatNode::get_next(RuntimeState* state, Block* 
block, bool* eos) {
 
 // current child block has finished its repeat, get child's next block
 if (_child_block->rows() == 0) {
-if (_child_eos) {

Review comment:
   This is unnessesary code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch array-type updated: [feature-wip](array-type) Add codes and UT for array_contains and array_position functions (#8401)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch array-type
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


The following commit(s) were added to refs/heads/array-type by this push:
 new 706f7ff  [feature-wip](array-type) Add codes and UT for array_contains 
and array_position functions (#8401)
706f7ff is described below

commit 706f7ff898b2e2599a29e36e63730d945bf53a5d
Author: camby <104178...@qq.com>
AuthorDate: Mon Mar 14 11:11:56 2022 +0800

[feature-wip](array-type) Add codes and UT for array_contains and 
array_position functions (#8401)

array_contains function Usage example:
1. create table with ARRAY column, and insert some data:
```
> select * from array_test;
+--+--++
| k1   | k2   | k3 |
+--+--++
|1 |2 | [1, 2] |
|2 |3 | NULL   |
|4 | NULL | [] |
|3 | NULL | NULL   |
+--+--++
```
2. enable vectorized:
```
> set enable_vectorized_engine=true;
```
3. select with array_contains:
```
> select k1,array_contains(k3,1) from array_test;
+--+-+
| k1   | array_contains(`k3`, 1) |
+--+-+
|3 |NULL |
|1 |   1 |
|2 |NULL |
|4 |   0 |
+--+-+
```
4. also we can use array_contains in where condition
```
> select * from array_test where array_contains(k3,1);
+--+--++
| k1   | k2   | k3 |
+--+--++
|1 |2 | [1, 2] |
+--+--++
```
5. array_position usage example
```
> select k1,k3,array_position(k3,2) from array_test;
+--++-+
| k1   | k3 | array_position(`k3`, 2) |
+--++-+
|3 | NULL   |NULL |
|1 | [1, 2] |   2 |
|2 | NULL   |NULL |
|4 | [] |   0 |
+--++-+
```
---
 be/src/vec/CMakeLists.txt  |   2 +
 .../vec/functions/array/function_array_index.cpp   |  31 ++
 be/src/vec/functions/array/function_array_index.h  | 196 +++
 .../functions/array/function_array_register.cpp|  31 ++
 be/src/vec/functions/simple_function_factory.h |   2 +
 be/src/vec/olap/vgeneric_iterators.cpp |   3 -
 be/test/vec/exec/vgeneric_iterators_test.cpp   |   3 -
 be/test/vec/function/CMakeLists.txt|   1 +
 be/test/vec/function/function_array_index_test.cpp | 127 +++
 be/test/vec/function/function_test_util.h  | 384 ++---
 .../java/org/apache/doris/catalog/ArrayType.java   |   4 +
 gensrc/script/doris_builtins_functions.py  |  37 ++
 12 files changed, 608 insertions(+), 213 deletions(-)

diff --git a/be/src/vec/CMakeLists.txt b/be/src/vec/CMakeLists.txt
index 0024fd0..fc81438 100644
--- a/be/src/vec/CMakeLists.txt
+++ b/be/src/vec/CMakeLists.txt
@@ -106,6 +106,8 @@ set(VEC_FILES
   exprs/vcast_expr.cpp
   exprs/vcase_expr.cpp
   exprs/vinfo_func.cpp
+  functions/array/function_array_index.cpp
+  functions/array/function_array_register.cpp
   functions/math.cpp
   functions/function_bitmap.cpp
   functions/function_bitmap_variadic.cpp
diff --git a/be/src/vec/functions/array/function_array_index.cpp 
b/be/src/vec/functions/array/function_array_index.cpp
new file mode 100644
index 000..474500e
--- /dev/null
+++ b/be/src/vec/functions/array/function_array_index.cpp
@@ -0,0 +1,31 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "vec/functions/array/function_array_index.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris::vectorized {
+
+struct NameArrayContains { static constexpr auto name = "array_contains"; };
+struct NameArrayPosition { static constexpr auto name = "array_position"; };
+
+void register_function_array_index(SimpleFunctionFactor

[GitHub] [incubator-doris] morningman merged pull request #8401: [feature][array-type]add array_contains and array_position functions

2022-03-13 Thread GitBox


morningman merged pull request #8401:
URL: https://github.com/apache/incubator-doris/pull/8401


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #8408: [Benchmark] Add TPC-H benchmark tools

2022-03-13 Thread GitBox


EmmyMiao87 commented on a change in pull request #8408:
URL: https://github.com/apache/incubator-doris/pull/8408#discussion_r825566731



##
File path: tools/tpch-tools/README.md
##
@@ -0,0 +1,34 @@
+
+
+## Usage
+
+These scripts are used to make tpc-h test.
+follow the steps below:
+
+### 1. build tpc-h dbgen tool.
+./build-tpch-dbgen.sh
+### 2. generate tpc-h data. use -h for more infomations.
+./gen-tpch-data.sh -s 1
+### 3. create tpc-h tables. modify `doris-cluster.conf` to specify doris info, 
then run script below.
+./create-tpch-tables.sh
+### 4. load tpc-h data. use -h for help.
+./load-tpch-data.sh
+### 5. run tpc-h queries.
+./run-tpch-queries.sh

Review comment:
   In fact, the test set query of tpch is slightly different under 
different data volumes.
   So it's best to explain in the document that the query we give is a query 
under how big a dataset.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


zenoyang commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825582068



##
File path: be/src/vec/columns/column_dictionary.h
##
@@ -0,0 +1,381 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "gutil/hash/string_hash.h"
+#include "olap/decimal12.h"
+#include "olap/uint24.h"
+#include "runtime/string_value.h"
+#include "util/slice.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_decimal.h"
+#include "vec/columns/column_impl.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/predicate_column.h"
+#include "vec/core/types.h"
+
+namespace doris::vectorized {
+
+/**
+ * For low cardinality string columns, using ColumnDictionary can reducememory
+ * usage and improve query efficiency.
+ * For equal predicate comparisons, convert the predicate constant to encodings
+ * according to the dictionary, so that encoding comparisons are used instead
+ * of string comparisons to improve performance.
+ * For range comparison predicates, it is necessary to sort the dictionary
+ * contents, convert the encoding column, and then compare the encoding 
directly.
+ * If the read data page contains plain-encoded data pages, the dictionary
+ * columns are converted into PredicateColumn for processing.
+ * Currently ColumnDictionary is only used for storage layer.
+ */
+template 
+class ColumnDictionary final : public COWHelper> {
+private:
+friend class COWHelper;
+
+ColumnDictionary() {}
+ColumnDictionary(const size_t n) : codes(n) {}
+ColumnDictionary(const ColumnDictionary& src) : codes(src.codes.begin(), 
src.codes.end()) {}
+
+public:
+using Self = ColumnDictionary;
+using value_type = T;
+using Container = PaddedPODArray;
+using DictContainer = PaddedPODArray;
+
+bool is_numeric() const override { return false; }
+
+bool is_predicate_column() const override { return false; }
+
+bool is_column_dictionary() const override { return true; }
+
+size_t size() const override { return codes.size(); }
+
+[[noreturn]] StringRef get_data_at(size_t n) const override {
+LOG(FATAL) << "get_data_at not supported in ColumnDictionary";
+}
+
+void insert_from(const IColumn& src, size_t n) override {
+LOG(FATAL) << "insert_from not supported in ColumnDictionary";
+}
+
+void insert_range_from(const IColumn& src, size_t start, size_t length) 
override {
+LOG(FATAL) << "insert_range_from not supported in ColumnDictionary";
+}
+
+void insert_indices_from(const IColumn& src, const int* indices_begin,
+ const int* indices_end) override {
+LOG(FATAL) << "insert_indices_from not supported in ColumnDictionary";
+}
+
+void pop_back(size_t n) override { LOG(FATAL) << "pop_back not supported 
in ColumnDictionary"; }
+
+void update_hash_with_value(size_t n, SipHash& hash) const override {
+LOG(FATAL) << "update_hash_with_value not supported in 
ColumnDictionary";
+}
+
+void insert_data(const char* pos, size_t /*length*/) override {
+codes.push_back(unaligned_load(pos));
+}
+
+void insert_data(const T value) { codes.push_back(value); }
+
+void insert_default() override { codes.push_back(T()); }
+
+void clear() override { codes.clear(); }
+
+// TODO: Make dict memory usage more precise
+size_t byte_size() const override { return codes.size() * 
sizeof(codes[0]); }
+
+size_t allocated_bytes() const override { return byte_size(); }
+
+void protect() override {}
+
+void get_permutation(bool reverse, size_t limit, int nan_direction_hint,
+ IColumn::Permutation& res) const override {
+LOG(FATAL) << "get_permutation not supported in ColumnDictionary";
+}
+
+void reserve(size_t n) override { codes.reserve(n); }
+
+[[noreturn]] const char* get_family_name() const override {
+LOG(FATAL) << "get_family_name not supported in ColumnDictionary";
+}
+
+[[noreturn]] MutableColumnPtr clone_resized(size_t size) const override {
+LOG(FATAL) << "clo

[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


zenoyang commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825586828



##
File path: be/src/olap/comparison_predicate.cpp
##
@@ -145,28 +146,68 @@ COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(LessEqualPredicate, 
<=)
 COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(GreaterPredicate, >)
 COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(GreaterEqualPredicate, >=)
 
-#define COMPARISON_PRED_COLUMN_EVALUATE(CLASS, OP) 
\
+#define COMPARISON_PRED_COLUMN_EVALUATE(CLASS, OP, IS_RANGE)   
\
 template   
\
 void CLASS::evaluate(vectorized::IColumn& column, uint16_t* sel, 
uint16_t* size) const { \
 uint16_t new_size = 0; 
\
 if (column.is_nullable()) {
\
-auto* nullable_column =
\
+auto* nullable_col =   
\
 
vectorized::check_and_get_column(column);  \
-auto& null_bitmap = reinterpret_cast&>(\
-
*(nullable_column->get_null_map_column_ptr())) \
+auto& null_bitmap = reinterpret_cast(  \
+nullable_col->get_null_map_column())   
\
 .get_data();   
\
-auto* nest_column_vector = 
\
-
vectorized::check_and_get_column>(   \
-nullable_column->get_nested_column()); 
\
-auto& data_array = nest_column_vector->get_data(); 
\
-for (uint16_t i = 0; i < *size; i++) { 
\
-uint16_t idx = sel[i]; 
\
-sel[new_size] = idx;   
\
-const type& cell_value = reinterpret_cast(data_array[idx]);   \
-bool ret = !null_bitmap[idx] && (cell_value OP _value);
\
-new_size += _opposite ? !ret : ret;
\
+auto& nested_col = nullable_col->get_nested_column();  
\
+if (nested_col.is_column_dictionary()) {   
\
+if constexpr (std::is_same_v) { 
\
+auto* nested_col_ptr = vectorized::check_and_get_column<   
\
+
vectorized::ColumnDictionary>(nested_col);  \
+auto code = nested_col_ptr->find_code(_value); 
\
+if (code < 0 && IS_RANGE) {
\
+code = nested_col_ptr->find_bound_code(_value, 0 OP 1, 
1 OP 1 );   \
+}  
\
+auto& data_array = nested_col_ptr->get_data(); 
\
+for (uint16_t i = 0; i < *size; i++) { 
\
+uint16_t idx = sel[i]; 
\
+sel[new_size] = idx;   
\
+const auto& cell_value =   
\
+reinterpret_cast(data_array[idx]);   \
+bool ret = !null_bitmap[idx] && (cell_value OP code);  
\
+new_size += _opposite ? !ret : ret;
\
+}  
\
+}  
\
+} else {   
\
+auto* nested_col_ptr = 
\
+
vectorized::check_and_get_column>(   \
+nested_col);   
\
+   

[GitHub] [incubator-doris] zenoyang commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


zenoyang commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825587115



##
File path: be/src/vec/columns/column_dictionary.h
##
@@ -0,0 +1,381 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "gutil/hash/string_hash.h"
+#include "olap/decimal12.h"
+#include "olap/uint24.h"
+#include "runtime/string_value.h"
+#include "util/slice.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_decimal.h"
+#include "vec/columns/column_impl.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/predicate_column.h"
+#include "vec/core/types.h"
+
+namespace doris::vectorized {
+
+/**
+ * For low cardinality string columns, using ColumnDictionary can reducememory

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825596165



##
File path: be/src/vec/columns/column_dictionary.h
##
@@ -0,0 +1,381 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "gutil/hash/string_hash.h"
+#include "olap/decimal12.h"
+#include "olap/uint24.h"
+#include "runtime/string_value.h"
+#include "util/slice.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_decimal.h"
+#include "vec/columns/column_impl.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/predicate_column.h"
+#include "vec/core/types.h"
+
+namespace doris::vectorized {
+
+/**
+ * For low cardinality string columns, using ColumnDictionary can reducememory
+ * usage and improve query efficiency.
+ * For equal predicate comparisons, convert the predicate constant to encodings
+ * according to the dictionary, so that encoding comparisons are used instead
+ * of string comparisons to improve performance.
+ * For range comparison predicates, it is necessary to sort the dictionary
+ * contents, convert the encoding column, and then compare the encoding 
directly.
+ * If the read data page contains plain-encoded data pages, the dictionary
+ * columns are converted into PredicateColumn for processing.
+ * Currently ColumnDictionary is only used for storage layer.
+ */
+template 
+class ColumnDictionary final : public COWHelper> {
+private:
+friend class COWHelper;
+
+ColumnDictionary() {}
+ColumnDictionary(const size_t n) : codes(n) {}
+ColumnDictionary(const ColumnDictionary& src) : codes(src.codes.begin(), 
src.codes.end()) {}
+
+public:
+using Self = ColumnDictionary;
+using value_type = T;
+using Container = PaddedPODArray;
+using DictContainer = PaddedPODArray;
+
+bool is_numeric() const override { return false; }
+
+bool is_predicate_column() const override { return false; }
+
+bool is_column_dictionary() const override { return true; }
+
+size_t size() const override { return codes.size(); }
+
+[[noreturn]] StringRef get_data_at(size_t n) const override {
+LOG(FATAL) << "get_data_at not supported in ColumnDictionary";
+}
+
+void insert_from(const IColumn& src, size_t n) override {
+LOG(FATAL) << "insert_from not supported in ColumnDictionary";
+}
+
+void insert_range_from(const IColumn& src, size_t start, size_t length) 
override {
+LOG(FATAL) << "insert_range_from not supported in ColumnDictionary";
+}
+
+void insert_indices_from(const IColumn& src, const int* indices_begin,
+ const int* indices_end) override {
+LOG(FATAL) << "insert_indices_from not supported in ColumnDictionary";
+}
+
+void pop_back(size_t n) override { LOG(FATAL) << "pop_back not supported 
in ColumnDictionary"; }
+
+void update_hash_with_value(size_t n, SipHash& hash) const override {
+LOG(FATAL) << "update_hash_with_value not supported in 
ColumnDictionary";
+}
+
+void insert_data(const char* pos, size_t /*length*/) override {
+codes.push_back(unaligned_load(pos));
+}
+
+void insert_data(const T value) { codes.push_back(value); }
+
+void insert_default() override { codes.push_back(T()); }
+
+void clear() override { codes.clear(); }
+
+// TODO: Make dict memory usage more precise
+size_t byte_size() const override { return codes.size() * 
sizeof(codes[0]); }
+
+size_t allocated_bytes() const override { return byte_size(); }
+
+void protect() override {}
+
+void get_permutation(bool reverse, size_t limit, int nan_direction_hint,
+ IColumn::Permutation& res) const override {
+LOG(FATAL) << "get_permutation not supported in ColumnDictionary";
+}
+
+void reserve(size_t n) override { codes.reserve(n); }
+
+[[noreturn]] const char* get_family_name() const override {
+LOG(FATAL) << "get_family_name not supported in ColumnDictionary";
+}
+
+[[noreturn]] MutableColumnPtr clone_resized(size_t size) const override {
+LOG(FATAL) << "clone

[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825596739



##
File path: be/src/vec/columns/column_dictionary.h
##
@@ -0,0 +1,381 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "gutil/hash/string_hash.h"
+#include "olap/decimal12.h"
+#include "olap/uint24.h"
+#include "runtime/string_value.h"
+#include "util/slice.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_decimal.h"
+#include "vec/columns/column_impl.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/predicate_column.h"
+#include "vec/core/types.h"
+
+namespace doris::vectorized {
+
+/**
+ * For low cardinality string columns, using ColumnDictionary can reducememory
+ * usage and improve query efficiency.
+ * For equal predicate comparisons, convert the predicate constant to encodings
+ * according to the dictionary, so that encoding comparisons are used instead
+ * of string comparisons to improve performance.
+ * For range comparison predicates, it is necessary to sort the dictionary
+ * contents, convert the encoding column, and then compare the encoding 
directly.
+ * If the read data page contains plain-encoded data pages, the dictionary
+ * columns are converted into PredicateColumn for processing.
+ * Currently ColumnDictionary is only used for storage layer.
+ */
+template 
+class ColumnDictionary final : public COWHelper> {
+private:
+friend class COWHelper;
+
+ColumnDictionary() {}
+ColumnDictionary(const size_t n) : codes(n) {}
+ColumnDictionary(const ColumnDictionary& src) : codes(src.codes.begin(), 
src.codes.end()) {}
+
+public:
+using Self = ColumnDictionary;
+using value_type = T;
+using Container = PaddedPODArray;
+using DictContainer = PaddedPODArray;
+
+bool is_numeric() const override { return false; }
+
+bool is_predicate_column() const override { return false; }
+
+bool is_column_dictionary() const override { return true; }
+
+size_t size() const override { return codes.size(); }
+
+[[noreturn]] StringRef get_data_at(size_t n) const override {
+LOG(FATAL) << "get_data_at not supported in ColumnDictionary";
+}
+
+void insert_from(const IColumn& src, size_t n) override {
+LOG(FATAL) << "insert_from not supported in ColumnDictionary";
+}
+
+void insert_range_from(const IColumn& src, size_t start, size_t length) 
override {
+LOG(FATAL) << "insert_range_from not supported in ColumnDictionary";
+}
+
+void insert_indices_from(const IColumn& src, const int* indices_begin,
+ const int* indices_end) override {
+LOG(FATAL) << "insert_indices_from not supported in ColumnDictionary";
+}
+
+void pop_back(size_t n) override { LOG(FATAL) << "pop_back not supported 
in ColumnDictionary"; }
+
+void update_hash_with_value(size_t n, SipHash& hash) const override {
+LOG(FATAL) << "update_hash_with_value not supported in 
ColumnDictionary";
+}
+
+void insert_data(const char* pos, size_t /*length*/) override {
+codes.push_back(unaligned_load(pos));
+}
+
+void insert_data(const T value) { codes.push_back(value); }
+
+void insert_default() override { codes.push_back(T()); }
+
+void clear() override { codes.clear(); }
+
+// TODO: Make dict memory usage more precise
+size_t byte_size() const override { return codes.size() * 
sizeof(codes[0]); }
+
+size_t allocated_bytes() const override { return byte_size(); }
+
+void protect() override {}
+
+void get_permutation(bool reverse, size_t limit, int nan_direction_hint,
+ IColumn::Permutation& res) const override {
+LOG(FATAL) << "get_permutation not supported in ColumnDictionary";
+}
+
+void reserve(size_t n) override { codes.reserve(n); }
+
+[[noreturn]] const char* get_family_name() const override {
+LOG(FATAL) << "get_family_name not supported in ColumnDictionary";
+}
+
+[[noreturn]] MutableColumnPtr clone_resized(size_t size) const override {
+LOG(FATAL) << "clone

[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825598921



##
File path: be/src/vec/columns/column_dictionary.h
##
@@ -0,0 +1,381 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "gutil/hash/string_hash.h"
+#include "olap/decimal12.h"
+#include "olap/uint24.h"
+#include "runtime/string_value.h"
+#include "util/slice.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_decimal.h"
+#include "vec/columns/column_impl.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/predicate_column.h"
+#include "vec/core/types.h"
+
+namespace doris::vectorized {
+
+/**
+ * For low cardinality string columns, using ColumnDictionary can reducememory
+ * usage and improve query efficiency.
+ * For equal predicate comparisons, convert the predicate constant to encodings
+ * according to the dictionary, so that encoding comparisons are used instead
+ * of string comparisons to improve performance.
+ * For range comparison predicates, it is necessary to sort the dictionary
+ * contents, convert the encoding column, and then compare the encoding 
directly.
+ * If the read data page contains plain-encoded data pages, the dictionary
+ * columns are converted into PredicateColumn for processing.
+ * Currently ColumnDictionary is only used for storage layer.
+ */
+template 
+class ColumnDictionary final : public COWHelper> {
+private:
+friend class COWHelper;
+
+ColumnDictionary() {}
+ColumnDictionary(const size_t n) : codes(n) {}
+ColumnDictionary(const ColumnDictionary& src) : codes(src.codes.begin(), 
src.codes.end()) {}
+
+public:
+using Self = ColumnDictionary;
+using value_type = T;
+using Container = PaddedPODArray;
+using DictContainer = PaddedPODArray;
+
+bool is_numeric() const override { return false; }
+
+bool is_predicate_column() const override { return false; }
+
+bool is_column_dictionary() const override { return true; }
+
+size_t size() const override { return codes.size(); }
+
+[[noreturn]] StringRef get_data_at(size_t n) const override {
+LOG(FATAL) << "get_data_at not supported in ColumnDictionary";
+}
+
+void insert_from(const IColumn& src, size_t n) override {
+LOG(FATAL) << "insert_from not supported in ColumnDictionary";
+}
+
+void insert_range_from(const IColumn& src, size_t start, size_t length) 
override {
+LOG(FATAL) << "insert_range_from not supported in ColumnDictionary";
+}
+
+void insert_indices_from(const IColumn& src, const int* indices_begin,
+ const int* indices_end) override {
+LOG(FATAL) << "insert_indices_from not supported in ColumnDictionary";
+}
+
+void pop_back(size_t n) override { LOG(FATAL) << "pop_back not supported 
in ColumnDictionary"; }
+
+void update_hash_with_value(size_t n, SipHash& hash) const override {
+LOG(FATAL) << "update_hash_with_value not supported in 
ColumnDictionary";
+}
+
+void insert_data(const char* pos, size_t /*length*/) override {
+codes.push_back(unaligned_load(pos));
+}
+
+void insert_data(const T value) { codes.push_back(value); }
+
+void insert_default() override { codes.push_back(T()); }
+
+void clear() override { codes.clear(); }
+
+// TODO: Make dict memory usage more precise
+size_t byte_size() const override { return codes.size() * 
sizeof(codes[0]); }
+
+size_t allocated_bytes() const override { return byte_size(); }
+
+void protect() override {}
+
+void get_permutation(bool reverse, size_t limit, int nan_direction_hint,
+ IColumn::Permutation& res) const override {
+LOG(FATAL) << "get_permutation not supported in ColumnDictionary";
+}
+
+void reserve(size_t n) override { codes.reserve(n); }
+
+[[noreturn]] const char* get_family_name() const override {
+LOG(FATAL) << "get_family_name not supported in ColumnDictionary";
+}
+
+[[noreturn]] MutableColumnPtr clone_resized(size_t size) const override {
+LOG(FATAL) << "clone

[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825601591



##
File path: be/src/runtime/string_value.h
##
@@ -22,9 +22,53 @@
 
 #include "udf/udf.h"
 #include "util/hash_util.hpp"
+#include "util/cpu_info.h"
+#include "vec/common/string_ref.h"
+#ifdef __SSE4_2__
+#include "util/sse_util.hpp"
+#endif
 
 namespace doris {
 
+// Compare two strings using sse4.2 intrinsics if they are available. This 
code assumes
+// that the trivial cases are already handled (i.e. one string is empty).
+// Returns:
+//   < 0 if s1 < s2
+//   0 if s1 == s2
+//   > 0 if s1 > s2
+// The SSE code path is just under 2x faster than the non-sse code path.
+//   - s1/n1: ptr/len for the first string
+//   - s2/n2: ptr/len for the second string
+//   - len: min(n1, n2) - this can be more cheaply passed in by the caller
+static inline int string_compare(const char* s1, int64_t n1, const char* s2, 
int64_t n2,

Review comment:
   Why move it here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825602217



##
File path: be/src/olap/rowset/segment_v2/segment_iterator.cpp
##
@@ -856,6 +857,18 @@ void 
SegmentIterator::_evaluate_short_circuit_predicate(uint16_t* vec_sel_rowid_
 for (auto column_predicate : _short_cir_eval_predicate) {
 auto column_id = column_predicate->column_id();
 auto& short_cir_column = _current_return_columns[column_id];
+auto* col_ptr = short_cir_column.get();

Review comment:
   Please add a todo for code refactor here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825604376



##
File path: be/src/olap/in_list_predicate.cpp
##
@@ -122,21 +123,56 @@ IN_LIST_PRED_COLUMN_BLOCK_EVALUATE(NotInListPredicate, ==)
 void CLASS::evaluate(vectorized::IColumn& column, uint16_t* sel, 
uint16_t* size) const { \
 uint16_t new_size = 0; 
\
 if (column.is_nullable()) {
\
-auto* nullable_column =
\
-
vectorized::check_and_get_column(column);   
   \
-auto& null_bitmap = reinterpret_cast&>(*(  \
-nullable_column->get_null_map_column_ptr())).get_data();   
\
-auto* nest_column_vector = vectorized::check_and_get_column
\
-
>(nullable_column->get_nested_column());  
   \
-auto& data_array = nest_column_vector->get_data(); 
\
-for (uint16_t i = 0; i < *size; i++) { 
\
-uint16_t idx = sel[i]; 
\
-sel[new_size] = idx;   
\
-const type& cell_value = reinterpret_cast(data_array[idx]);   \
-bool ret = !null_bitmap[idx] && (_values.find(cell_value) OP 
_values.end());   \
-new_size += _opposite ? !ret : ret;
\
+auto* nullable_col =   
\
+
vectorized::check_and_get_column(column);  \
+auto& null_bitmap = reinterpret_cast(  \
+
nullable_col->get_null_map_column()).get_data();   \
+auto& nested_col = nullable_col->get_nested_column();  
\
+if (nested_col.is_column_dictionary()) {   
\

Review comment:
   Too many branch, need a todo for code refactor here.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] syb853553110 commented on issue #8435: [Enhancement] The bitmap_hash function can be implemented using murmur_hash3_128

2022-03-13 Thread GitBox


syb853553110 commented on issue #8435:
URL: 
https://github.com/apache/incubator-doris/issues/8435#issuecomment-1066405091


   > I think this modification will be incompatible with old data @syb853553110 
https://doris.apache.org/zh-CN/sql-reference/sql-functions/bitmap-functions/bitmap_hash.html#description
   
   Is it possible to add a method? For example bitmap_hash128


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] wangbo commented on a change in pull request #8318: [improvement](storage) Low cardinality string optimization in storage layer

2022-03-13 Thread GitBox


wangbo commented on a change in pull request #8318:
URL: https://github.com/apache/incubator-doris/pull/8318#discussion_r825609607



##
File path: be/src/olap/comparison_predicate.cpp
##
@@ -145,28 +146,68 @@ COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(LessEqualPredicate, 
<=)
 COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(GreaterPredicate, >)
 COMPARISON_PRED_COLUMN_BLOCK_EVALUATE(GreaterEqualPredicate, >=)
 
-#define COMPARISON_PRED_COLUMN_EVALUATE(CLASS, OP) 
\
+#define COMPARISON_PRED_COLUMN_EVALUATE(CLASS, OP, IS_RANGE)   
\
 template   
\
 void CLASS::evaluate(vectorized::IColumn& column, uint16_t* sel, 
uint16_t* size) const { \
 uint16_t new_size = 0; 
\
 if (column.is_nullable()) {
\
-auto* nullable_column =
\
+auto* nullable_col =   
\
 
vectorized::check_and_get_column(column);  \
-auto& null_bitmap = reinterpret_cast&>(\
-
*(nullable_column->get_null_map_column_ptr())) \
+auto& null_bitmap = reinterpret_cast(  \
+nullable_col->get_null_map_column())   
\
 .get_data();   
\
-auto* nest_column_vector = 
\
-
vectorized::check_and_get_column>(   \
-nullable_column->get_nested_column()); 
\
-auto& data_array = nest_column_vector->get_data(); 
\
-for (uint16_t i = 0; i < *size; i++) { 
\
-uint16_t idx = sel[i]; 
\
-sel[new_size] = idx;   
\
-const type& cell_value = reinterpret_cast(data_array[idx]);   \
-bool ret = !null_bitmap[idx] && (cell_value OP _value);
\
-new_size += _opposite ? !ret : ret;
\
+auto& nested_col = nullable_col->get_nested_column();  
\
+if (nested_col.is_column_dictionary()) {   
\
+if constexpr (std::is_same_v) { 
\
+auto* nested_col_ptr = vectorized::check_and_get_column<   
\
+
vectorized::ColumnDictionary>(nested_col);  \
+auto code = nested_col_ptr->find_code(_value); 
\
+if (code < 0 && IS_RANGE) {
\
+code = nested_col_ptr->find_bound_code(_value, 0 OP 1, 
1 OP 1 );   \
+}  
\
+auto& data_array = nested_col_ptr->get_data(); 
\
+for (uint16_t i = 0; i < *size; i++) { 
\
+uint16_t idx = sel[i]; 
\
+sel[new_size] = idx;   
\
+const auto& cell_value =   
\
+reinterpret_cast(data_array[idx]);   \
+bool ret = !null_bitmap[idx] && (cell_value OP code);  
\
+new_size += _opposite ? !ret : ret;
\
+}  
\
+}  
\
+} else {   
\
+auto* nested_col_ptr = 
\
+
vectorized::check_and_get_column>(   \
+nested_col);   
\
+ 

[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8202: [improvment] show export support label like

2022-03-13 Thread GitBox


github-actions[bot] commented on pull request #8202:
URL: https://github.com/apache/incubator-doris/pull/8202#issuecomment-1066426017






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[incubator-doris] branch dev-1.0.1 created (now 9f6dabf)

2022-03-13 Thread morningman
This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git.


  at 9f6dabf  [fix][cherry-pick] fix compilation bug with cherry-pick

This branch includes the following new commits:

 new 6d9cd70  [chore](dependency) upgrade-grpc-version (#8218)
 new 343b851  [refactor](fe) Remove version hash on FE side (#8099)
 new 3f1dbd7  [improvement](olap) using placement-new to avoid dynamic 
mallocing for ParsedPage (#8172)
 new 6faac21  [Improvement] Add minimum fe meta version check   (#8203)
 new 0245d71  [refactor] change mysql server version to avoid some cve 
issues (#8223)
 new accff9d  [Enhancement](routine_load) Support show routine load 
statement with like predicate (#8188)
 new c04c9e1  [Feature](create_table) Support create table with random 
distribution to avoid data skew (#8041)
 new 3ac2b90  [feature](iceberg) Step3: Support query iceberg external 
table (#8179)
 new fa8e124  [typo](doc)fix some confusing doc content (#8239)
 new cf6582c  [chore] Support aarch64 target with ldb_toolchain (#8249)
 new 3378b40  [fix](be-ut) fix unit test bug for tablet_info_test (#8253)
 new 8081a9d  [refactor](fe) Remove old fe meta version (#8246)
 new 66c9bc2  [community] add more collaborators in .asf.yaml (#8029) 
(#8252)
 new 6268818  [doc] Modify document of compilation on ARM64 (#8254)
 new f39a58d  [typo] fix listdb description error (#8257)
 new 1aba49e  [docs] fix document date-time-functions typo (#8053)
 new fb7edf1  [feature][show-transaction] Support view transactions info 
for specified status by `SHOW TRANSACTION` stmt (#8156)
 new beaad22  [improvement] Upgrade MySQL version to 5.7.37 to reduce 
unnecessary CVE issues (#8247)
 new 0bcdba3  Revert "[chore](dependency) upgrade-grpc-version (#8218)" 
(#8250)
 new c0de629  [chore] make options of build.sh and run-be-ut.sh work (#8271)
 new bee31bd  [feature-wip][array-type] Refactor type info for nested 
array. (#8279)
 new b375e4f  [fix](ut) query stmt test error  (#8303)
 new 6cfa843  [fix] (rpc-udf) Fixed the problem that the query could not be 
interrupted (#8248)
 new 836d8ce  [fix](fe-ut) Fix FE unit test (#8293)
 new 925d3f6  [refactor] remove pusher.cpp and related mock test code 
(#8288)
 new f01312a  [refactor] remove types_test (#8289)
 new a22f286  [improvement][fix](grouping-set)(tablet-repair) optimize 
compaction too slow replica process, (#8123)
 new ae45eed  [improvement](restore) allow query on part of partitions when 
others are in RESTORE (#8245)
 new f6fee5b  [improvement](routine-load) Support routine load task succeed 
with empty data consumed (#8256)
 new 4d7fb6c  [Feature] Support Changing the bucketing mode of the table 
from Hash Distribution to Random Distribution (#8259)
 new b1b52fe  [Enhancement] Support Skipping compaction lower replica where 
select queryable replica for better scan performance (#8146)
 new f256b88  [typo]update spark build doc (#8333)
 new 62da121  [typo](comment) Translate the code comments of gensrc (#8308)
 new 377f2b3  [docs] fix invalid links in docker-dev document (#8313)
 new 564e4a0  [improvement][website] The expansion of sidebar is off by 
default (#8314)
 new 55f5f57  [typo] translate the comments of schema_change.h (#8321)
 new 3865303  [fix](ut) fix be ut fragment_mgr_test compile failed (#8344)
 new 02aab7b  [fix](planner) Convert format in RewriteFromUnixTimeRule 
(#8235)
 new 4a17e0c  [doc] Add sync job fe configuration item description (#8349)
 new cd423c3  format fe config title , add link for tablet_rebalancer_type 
(#8346)
 new bb41500  [docs]update http port doc to be more intuitive (#8343)
 new eb198c3  support doriswriter build in macos (#8330)
 new 83da0cf  [community] Modify doris connector release doc (#8275)
 new 08b0d3b  [typo]fix some typo in fe_config (#8325)
 new 9adfd37  [license] Organize third-party dependent licenses for bianry 
releases (#8350)
 new 2714d0f  [optimize] optimze tablet read, avoid to create too much 
scanner for small tablet (#8096)
 new 6f8a026  [doc] Translate Chinese comment to English (#8340)
 new 3c6fe9d  [improvement](regression-test) add aggregation tests from 
trino to doris (#8375)
 new 440c95a  [fix](replica) handle replica version missing info to avoid 
-214 error (#8209)
 new 722236d  [refactor] remove agent status (#8273)
 new 85c33a0  [docs] add document conditional-functions (#8339)
 new 1c71a59  [refactor] remove old schema change code on BE (#8342)
 new b63fc67  [doc] Update BROKER LOAD.md (#8361)
 new d796075  [doc] update substring.md (#8398)
 new dfd50da  [typo] translate the comments of delete_handler.cpp (#8402)
 new ecefc73  [improvement](vectorized) Merge block in scanner to speed u