[GitHub] [doris] Mryange commented on pull request #22047: [feature](executor) using fe version to set instance_num
Mryange commented on PR #22047: URL: https://github.com/apache/doris/pull/22047#issuecomment-1646763701 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] sohardforaname opened a new pull request, #22122: [Enhancement](planner)support fold constant for date_trunc()
sohardforaname opened a new pull request, #22122: URL: https://github.com/apache/doris/pull/22122 ## Proposed changes Issue Number: close #xxx support fold constant for date_trunc() ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] sohardforaname commented on pull request #22122: [Enhancement](planner)support fold constant for date_trunc()
sohardforaname commented on PR #22122: URL: https://github.com/apache/doris/pull/22122#issuecomment-1646763906 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] sohardforaname commented on pull request #21891: [Enhancement](Nereids)fix push down global limit to avoid gather.
sohardforaname commented on PR #21891: URL: https://github.com/apache/doris/pull/21891#issuecomment-1646764315 run fe ut -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21495: [improvement](Jsonb) optimization Jsonb path parse
github-actions[bot] commented on PR #21495: URL: https://github.com/apache/doris/pull/21495#issuecomment-1646765048 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] sohardforaname commented on pull request #21929: [Fix](regression-test)fix nereids_p0/javaudf and nereids_p0/outfile cases.
sohardforaname commented on PR #21929: URL: https://github.com/apache/doris/pull/21929#issuecomment-1646765149 run p0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] sohardforaname commented on pull request #22088: [Fix](Nereids)fix loading core when enable nereids DML default
sohardforaname commented on PR #22088: URL: https://github.com/apache/doris/pull/22088#issuecomment-1646765294 run p0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22056: [refactor](be) use std::move to improve performance of push_back
hello-stephen commented on PR #22056: URL: https://github.com/apache/doris/pull/22056#issuecomment-1646765426 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.66 seconds stream load tsv: 503 seconds loaded 74807831229 Bytes, about 141 MB/s stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.3 seconds inserted 1000 Rows, about 341K ops/s storage size: 17168232848 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22047: [feature](executor) using fe version to set instance_num
hello-stephen commented on PR #22047: URL: https://github.com/apache/doris/pull/22047#issuecomment-1646770090 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.53 seconds stream load tsv: 506 seconds loaded 74807831229 Bytes, about 140 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.2 seconds inserted 1000 Rows, about 342K ops/s storage size: 17161956801 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22122: [Enhancement](planner)support fold constant for date_trunc()
hello-stephen commented on PR #22122: URL: https://github.com/apache/doris/pull/22122#issuecomment-1646772701 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 47.37 seconds stream load tsv: 507 seconds loaded 74807831229 Bytes, about 140 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s insert into select: 29.9 seconds inserted 1000 Rows, about 334K ops/s storage size: 17163485694 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] englefly commented on pull request #22064: [stats](nereids)keep min/max expr in colstats
englefly commented on PR #22064: URL: https://github.com/apache/doris/pull/22064#issuecomment-1646778185 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21109: [Improve](serde)update serialize and deserialize text for data type
github-actions[bot] commented on PR #21109: URL: https://github.com/apache/doris/pull/21109#issuecomment-1646796311 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21109: [Improve](serde)update serialize and deserialize text for data type
github-actions[bot] commented on PR #21109: URL: https://github.com/apache/doris/pull/21109#issuecomment-1646796454 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] pegasas commented on issue #19664: [Bug] insert into values does not work as insert into select
pegasas commented on issue #19664: URL: https://github.com/apache/doris/issues/19664#issuecomment-1646808845 sync with @dataroaring offline, this issue should fixed on current master. skipped. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] zy-kkk merged pull request #21495: [improvement](Jsonb) optimization Jsonb path parse
zy-kkk merged PR #21495: URL: https://github.com/apache/doris/pull/21495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21495: [improvement](Jsonb) optimization Jsonb path parse
github-actions[bot] commented on PR #21495: URL: https://github.com/apache/doris/pull/21495#issuecomment-1646809418 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch master updated: [improvement](Jsonb) optimization Jsonb path parse (#21495)
This is an automated email from the ASF dual-hosted git repository. zykkk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/master by this push: new ddd7e9871d [improvement](Jsonb) optimization Jsonb path parse (#21495) ddd7e9871d is described below commit ddd7e9871ddcf177c47c7e21e2ec5d9232133739 Author: Liqf <109049295+lemonlit...@users.noreply.github.com> AuthorDate: Sun Jul 23 18:59:12 2023 +0800 [improvement](Jsonb) optimization Jsonb path parse (#21495) The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath --- be/src/util/jsonb_document.h| 273 be/src/vec/functions/function_jsonb.cpp | 101 +--- 2 files changed, 213 insertions(+), 161 deletions(-) diff --git a/be/src/util/jsonb_document.h b/be/src/util/jsonb_document.h index d54e7291dd..c21917e066 100644 --- a/be/src/util/jsonb_document.h +++ b/be/src/util/jsonb_document.h @@ -309,14 +309,6 @@ public: bool get_has_escapes() const { return has_escapes; } -void set_is_invalid_json_path(bool has) { is_invalid_json_path = has; } - -bool get_is_invalid_json_path() const { return is_invalid_json_path; } - -void set_type(unsigned int code) { type = code; } - -bool get_type() const { return type; } - private: /// The current position in the stream. const char* m_position; @@ -332,9 +324,17 @@ private: ///Whether to contain escape characters bool has_escapes = false; +}; + +struct leg_info { +///path leg ptr +char* leg_ptr; + +///path leg len +unsigned int leg_len; -///Is the json path valid -bool is_invalid_json_path = false; +///array_index +int array_index; ///type: 0 is member 1 is array unsigned int type; @@ -343,10 +343,24 @@ private: class JsonbPath { public: // parse json path -static bool parsePath(Stream* stream); +static bool parsePath(Stream* stream, JsonbPath* path); + +static bool parse_array(Stream* stream, JsonbPath* path); +static bool parse_member(Stream* stream, JsonbPath* path); + +//return true if json path valid else return false +bool seek(const char* string, size_t length); + +void add_leg_to_leg_vector(std::unique_ptr leg) { +leg_vector.emplace_back(leg.release()); +} -static bool parse_array(Stream* stream); -static bool parse_member(Stream* stream); +size_t get_leg_vector_size() { return leg_vector.size(); } + +leg_info* get_leg_from_leg_vector(size_t i) { return leg_vector[i].get(); } + +private: +std::vector> leg_vector; }; /* @@ -529,15 +543,8 @@ public: // get the raw byte array of the value const char* getValuePtr() const; -// find the JSONB value by a key path string (null terminated) -JsonbValue* findPath(const char* key_path, bool& is_invalid_json_path, - hDictFind handler = nullptr) { -return findPath(key_path, (unsigned int)strlen(key_path), is_invalid_json_path, handler); -} - -// find the JSONB value by a key path string (with length) -JsonbValue* findPath(const char* key_path, unsigned int len, bool& is_invalid_json_path, - hDictFind handler); +// find the JSONB value by JsonbPath +JsonbValue* findValue(JsonbPath& path, hDictFind handler); friend class JsonbDocument; protected: @@ -1207,154 +1214,100 @@ inline const char* JsonbValue::getValuePtr() const { } } -inline JsonbValue* JsonbValue::findPath(const char* key_path, unsigned int kp_len, -bool& is_invalid_json_path, hDictFind handler = nullptr) { -if (!key_path) return nullptr; -if (kp_len == 0) { -is_invalid_json_path = true; -return nullptr; -} +inline bool JsonbPath::seek(const char* key_path, size_t kp_len) { +//path invalid +if (!key_path || kp_len == 0) return false; Stream stream(key_path, kp_len); stream.skip_whitespace(); if (stream.exhausted() || stream.read() != SCOPE) { -is_invalid_json_path = true; -return nullptr; +//path invalid +return false; } -JsonbValue* pval = this; - -while (pval && !stream.exhausted()) { +while (!stream.exhausted()) { stream.skip_whitespace(); stream.clear_leg_ptr(); stream.clear_leg_len(); -if (!JsonbPath::parsePath(&stream)) { -is_invalid_json_path = stream.get_is_invalid_json_path(); -return nullptr; -} - -if (stream.get_leg_len() == 0) { -return nullptr; +if (!JsonbPath::parsePath(&stream, this)) { +//path invalid +return false; } +} +return true;
[GitHub] [doris] morningman commented on a diff in pull request #21975: [opt](filecache) use weak_ptr to cache the file handle of file segment
morningman commented on code in PR #21975: URL: https://github.com/apache/doris/pull/21975#discussion_r1270151394 ## be/src/common/config.cpp: ## @@ -1012,6 +1012,7 @@ DEFINE_mInt32(s3_write_buffer_size, "5242880"); // can at most buffer 50MB data. And the num of multi part upload task is // s3_write_buffer_whole_size / s3_write_buffer_size DEFINE_mInt32(s3_write_buffer_whole_size, "524288000"); +DEFINE_mInt64(file_cache_max_file_reader_cache_size, "100"); Review Comment: Add comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] morningman commented on pull request #22049: [improvement](s3) increase the connection num of s3 client
morningman commented on PR #22049: URL: https://github.com/apache/doris/pull/22049#issuecomment-1646813650 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] morningman merged pull request #22106: [minor](log) print error msg to fe.out before log is initialized
morningman merged PR #22106: URL: https://github.com/apache/doris/pull/22106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch master updated: [minor](log) print error msg to fe.out before log is initialized (#22106)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/master by this push: new a5099a2d3b [minor](log) print error msg to fe.out before log is initialized (#22106) a5099a2d3b is described below commit a5099a2d3b3bdfd4bb21361ffc1ed61b4195bbe8 Author: Mingyu Chen AuthorDate: Sun Jul 23 19:20:10 2023 +0800 [minor](log) print error msg to fe.out before log is initialized (#22106) The exception may be thrown before LOG is initialized. Such as wrong config value. So we need to print it to fe.out, otherwise we can't know what's wrong. After this PR, the error can be found in fe.out, such as: ``` java.lang.NumberFormatException: For input string: "3g" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.doris.common.ConfigBase.setConfigField(ConfigBase.java:253) at org.apache.doris.common.ConfigBase.setFields(ConfigBase.java:232) at org.apache.doris.common.ConfigBase.initConf(ConfigBase.java:146) at org.apache.doris.common.ConfigBase.init(ConfigBase.java:112) at org.apache.doris.DorisFE.start(DorisFE.java:101) at org.apache.doris.DorisFE.main(DorisFE.java:73) ``` --- fe/fe-core/src/main/java/org/apache/doris/DorisFE.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java b/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java index 7d87091120..07394d9cd4 100755 --- a/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java +++ b/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java @@ -187,6 +187,9 @@ public class DorisFE { Thread.sleep(2000); } } catch (Throwable e) { +// Some exception may thrown before LOG is inited. +// So need to print to stdout +e.printStackTrace(); LOG.warn("", e); } } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22062: [deps](hadoop) update hadoop libs to 3.3.4.5
github-actions[bot] commented on PR #22062: URL: https://github.com/apache/doris/pull/22062#issuecomment-1646813721 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22062: [deps](hadoop) update hadoop libs to 3.3.4.5
github-actions[bot] commented on PR #22062: URL: https://github.com/apache/doris/pull/22062#issuecomment-1646813727 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22105: [enhance](S3) add s3 bvar metrics for all s3 operation
github-actions[bot] commented on PR #22105: URL: https://github.com/apache/doris/pull/22105#issuecomment-1646818534 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22105: [enhance](S3) add s3 bvar metrics for all s3 operation
github-actions[bot] commented on PR #22105: URL: https://github.com/apache/doris/pull/22105#issuecomment-1646818543 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22049: [improvement](s3) increase the connection num of s3 client
hello-stephen commented on PR #22049: URL: https://github.com/apache/doris/pull/22049#issuecomment-1646821913 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.05 seconds stream load tsv: 509 seconds loaded 74807831229 Bytes, about 140 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.3 seconds inserted 1000 Rows, about 341K ops/s storage size: 17162136982 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] xiaokang merged pull request #22062: [deps](hadoop) update hadoop libs to 3.3.4.5
xiaokang merged PR #22062: URL: https://github.com/apache/doris/pull/22062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch master updated: [deps](hadoop) update hadoop libs to 3.3.4.5 (#22062)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/master by this push: new 0c811edb78 [deps](hadoop) update hadoop libs to 3.3.4.5 (#22062) 0c811edb78 is described below commit 0c811edb78d54cab23386ca35c67f3a96aa48b90 Author: Mingyu Chen AuthorDate: Sun Jul 23 20:17:16 2023 +0800 [deps](hadoop) update hadoop libs to 3.3.4.5 (#22062) --- thirdparty/CHANGELOG.md | 4 thirdparty/vars.sh | 8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/thirdparty/CHANGELOG.md b/thirdparty/CHANGELOG.md index abc7c83e76..c90363fcb7 100644 --- a/thirdparty/CHANGELOG.md +++ b/thirdparty/CHANGELOG.md @@ -2,6 +2,10 @@ This file contains version of the third-party dependency libraries in the build-env image. The docker build-env image is apache/doris, and the tag is `build-env-${version}` +## v20230721 + +- Modified hadoop libhdfs 3.3.4.4 -> 3.3.4.5 + ## v20230625 - Modified benchmark 1.5.6 -> 1.8.0 diff --git a/thirdparty/vars.sh b/thirdparty/vars.sh index f10c56fa9e..435434927f 100644 --- a/thirdparty/vars.sh +++ b/thirdparty/vars.sh @@ -459,10 +459,10 @@ FAST_FLOAT_SOURCE=fast_float-3.9.0 FAST_FLOAT_MD5SUM="5656b0d8b150a3b157cfb092d214f6ea" # libhdfs -HADOOP_LIBS_DOWNLOAD="https://github.com/apache/doris-thirdparty/archive/refs/tags/hadoop-3.3.4.4-for-doris.tar.gz"; -HADOOP_LIBS_NAME="hadoop-3.3.4.4-for-doris.tar.gz" -HADOOP_LIBS_SOURCE="doris-thirdparty-hadoop-3.3.4.4-for-doris" -HADOOP_LIBS_MD5SUM="00f0042dd3900ba016f079ee9c550efb" +HADOOP_LIBS_DOWNLOAD="https://github.com/apache/doris-thirdparty/archive/refs/tags/hadoop-3.3.4.5-for-doris.tar.gz"; +HADOOP_LIBS_NAME="hadoop-3.3.4.5-for-doris.tar.gz" +HADOOP_LIBS_SOURCE="doris-thirdparty-hadoop-3.3.4.5-for-doris" +HADOOP_LIBS_MD5SUM="15b7be1747b27c37923b0cb9db6cff8c" # all thirdparties which need to be downloaded is set in array TP_ARCHIVES export TP_ARCHIVES=( - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] AshinGau commented on pull request #21975: [opt](filecache) use weak_ptr to cache the file handle of file segment
AshinGau commented on PR #21975: URL: https://github.com/apache/doris/pull/21975#issuecomment-1646831195 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] AshinGau commented on a diff in pull request #21975: [opt](filecache) use weak_ptr to cache the file handle of file segment
AshinGau commented on code in PR #21975: URL: https://github.com/apache/doris/pull/21975#discussion_r1271441778 ## be/src/common/config.cpp: ## @@ -1012,6 +1012,7 @@ DEFINE_mInt32(s3_write_buffer_size, "5242880"); // can at most buffer 50MB data. And the num of multi part upload task is // s3_write_buffer_whole_size / s3_write_buffer_size DEFINE_mInt32(s3_write_buffer_whole_size, "524288000"); +DEFINE_mInt64(file_cache_max_file_reader_cache_size, "100"); Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21975: [opt](filecache) use weak_ptr to cache the file handle of file segment
github-actions[bot] commented on PR #21975: URL: https://github.com/apache/doris/pull/21975#issuecomment-1646832581 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] taptao commented on issue #21268: [Enhancement] Improve Json function for doris
taptao commented on issue #21268: URL: https://github.com/apache/doris/issues/21268#issuecomment-1646835568 i pick JSON_MERGE_PRESERVE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] HappenLee commented on a diff in pull request #22086: [Improvement](pipeline) support send eos on local exchange and remove some unused code
HappenLee commented on code in PR #22086: URL: https://github.com/apache/doris/pull/22086#discussion_r1271446540 ## be/src/vec/sink/vdata_stream_sender.cpp: ## @@ -266,7 +266,13 @@ Status Channel::close_internal() { status = send_current_block(true); } else { SCOPED_CONSUME_MEM_TRACKER(_parent->_mem_tracker.get()); -status = send_block((PBlock*)nullptr, true); +if (is_local()) { +if (_recvr_is_valid()) { Review Comment: The receiver just judge the eof by `sender_number == 0`; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] jackwener opened a new pull request, #22123: [fix](Nereids): mergeGroup should merge target Group into existed Group
jackwener opened a new pull request, #22123: URL: https://github.com/apache/doris/pull/22123 ## Proposed changes Issue Number: close #xxx ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22028: [pipeline](refactor) refactor pipeline task schedule logics
github-actions[bot] commented on PR #22028: URL: https://github.com/apache/doris/pull/22028#issuecomment-1646845604 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #21975: [opt](filecache) use weak_ptr to cache the file handle of file segment
hello-stephen commented on PR #21975: URL: https://github.com/apache/doris/pull/21975#issuecomment-1646847274 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.07 seconds stream load tsv: 499 seconds loaded 74807831229 Bytes, about 142 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s insert into select: 28.9 seconds inserted 1000 Rows, about 346K ops/s storage size: 17165511333 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] morningman commented on a diff in pull request #22115: [opt](hive)opt select count(*) stmt push down agg on parquet in hive .
morningman commented on code in PR #22115: URL: https://github.com/apache/doris/pull/22115#discussion_r1271375451 ## be/src/vec/exec/format/parquet/vparquet_reader.cpp: ## @@ -511,6 +511,34 @@ Status ParquetReader::get_columns(std::unordered_mapnum_rows , because for the same file, +// the optimizer may generate multiple VFileScanner with different _scan_range +while (_read_row_groups.size() > 0) { +_next_row_group_reader(); +rows += _current_group_reader->get__remaining_rows(); +} + +//fill one column is enough +auto cols = block->mutate_columns(); +for (auto& col : cols) { +col->resize(rows); Review Comment: The `rows` maybe too large for the resize? Normally, a block only return 4096 rows. But here you may return unlimited rows. I think it should be splitted in batch? ## be/src/vec/exec/format/generic_reader.h: ## @@ -31,6 +31,12 @@ class Block; class GenericReader { public: virtual Status get_next_block(Block* block, size_t* read_rows, bool* eof) = 0; + +virtual Status get_next_block(Block* block, size_t* read_rows, bool* eof, Review Comment: How about merge these 2 methods? ## be/src/vec/exec/scan/vscan_node.h: ## @@ -351,6 +351,9 @@ class VScanNode : public ExecNode, public RuntimeFilterConsumer { std::unordered_map _colname_to_slot_id; std::vector _col_distribute_ids; +public: +TPushAggOp::type push_down_agg_type_opt; Review Comment: Better not using public to define a field ## fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java: ## @@ -1363,8 +1360,8 @@ protected void toThrift(TPlanNode msg) { msg.olap_scan_node.setTableName(olapTable.getName()); msg.olap_scan_node.setEnableUniqueKeyMergeOnWrite(olapTable.getEnableUniqueKeyMergeOnWrite()); -if (pushDownAggNoGroupingOp != null) { -msg.olap_scan_node.setPushDownAggTypeOpt(pushDownAggNoGroupingOp); +if (pushDownAggNoGroupingOp != TPushAggOp.NONE) { +msg.setPushDownAggTypeOpt(pushDownAggNoGroupingOp); Review Comment: I think we can ALWAYS set this field ## fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java: ## @@ -310,4 +299,28 @@ private void genSlotToSchemaIdMap() { } params.setSlotNameToSchemaPos(columnNameToPosition); } + +@Override +public boolean pushDownAggNoGrouping(FunctionCallExpr aggExpr) { +TFileFormatType fileFormatType; +try { +fileFormatType = getFileFormatType(); +} catch (UserException e) { +throw new RuntimeException(e); +} + +String aggFunctionName = aggExpr.getFnName().getFunction(); +if (aggFunctionName.equalsIgnoreCase("COUNT") && fileFormatType == TFileFormatType.FORMAT_PARQUET) { Review Comment: Need to implement orc too ## gensrc/thrift/PlanNodes.thrift: ## @@ -638,12 +638,11 @@ struct TOlapScanNode { // It's limit for scanner instead of scanNode so we add a new limit. 10: optional i64 sort_limit 11: optional bool enable_unique_key_merge_on_write - 12: optional TPushAggOp push_down_agg_type_opt - 13: optional bool use_topn_opt - 14: optional list indexes_desc - 15: optional set output_column_unique_ids - 16: optional list distribute_column_ids - 17: optional i32 schema_version + 12: optional bool use_topn_opt Review Comment: You can't modify the origin structure of thrift, or it will cause problem when upgrading. You can mark the old `push_down_agg_type_opt` as `Deprecated`, and make some compatibility when visiting this field ## fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java: ## @@ -310,4 +299,28 @@ private void genSlotToSchemaIdMap() { } params.setSlotNameToSchemaPos(columnNameToPosition); } + +@Override +public boolean pushDownAggNoGrouping(FunctionCallExpr aggExpr) { +TFileFormatType fileFormatType; +try { +fileFormatType = getFileFormatType(); +} catch (UserException e) { +throw new RuntimeException(e); +} + +String aggFunctionName = aggExpr.getFnName().getFunction(); +if (aggFunctionName.equalsIgnoreCase("COUNT") && fileFormatType == TFileFormatType.FORMAT_PARQUET) { +return true; +} +return false; +} + +@Override +public boolean pushDownAggNoGroupingCheckCol(FunctionCallExpr aggExpr, Column col) { Review Comment: For external table, always return false. ## be/src/vec/exec/scan/vfile_scanner.cpp: ## @@ -245,7 +245,19 @@ Status VFileScanner::_get_block_impl(RuntimeState* state, Block* block, bool* eo RETURN_IF_ERROR(_init_src_block(block)); { SCOPED_TIMER(_get_block_timer); + // Read next block. + +if (_par
[GitHub] [doris] morningman opened a new pull request, #22124: [feature](create-table) support setting replication num for creating table operation globally
morningman opened a new pull request, #22124: URL: https://github.com/apache/doris/pull/22124 cherry-pick #21848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] englefly commented on a diff in pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
englefly commented on code in PR #22000: URL: https://github.com/apache/doris/pull/22000#discussion_r1271456350 ## fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java: ## @@ -687,37 +687,32 @@ private Statistics computeLimit(Limit limit) { private double estimateGroupByRowCount(List groupByExpressions, Statistics childStats) { double rowCount = 1; -Map groupByColStats = new HashMap<>(); -for (Expression groupByExpr : groupByExpressions) { -ColumnStatistic colStats = childStats.findColumnStatistics(groupByExpr); -if (colStats == null) { -colStats = ExpressionEstimation.estimate(groupByExpr, childStats); -} -groupByColStats.put(groupByExpr, colStats); +List groupByNdvs = groupByExpressions.stream() +.map(groupByExpr -> { +ColumnStatistic colStats = childStats.findColumnStatistics(groupByExpr); +if (colStats == null) { +colStats = ExpressionEstimation.estimate(groupByExpr, childStats); +} +return colStats.isUnKnown() ? -1 : colStats.ndv; +}) +.sorted(Comparator.reverseOrder()) +.collect(Collectors.toList()); +if (groupByExpressions.isEmpty()) { +return 1; } -int groupByCount = groupByExpressions.size(); -if (groupByColStats.values().stream().anyMatch(ColumnStatistic::isUnKnown)) { +if (groupByNdvs.stream().anyMatch(ndv -> ndv == -1)) { Review Comment: isUnknown is not equal to "ndv==-1". ## fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java: ## @@ -687,37 +687,32 @@ private Statistics computeLimit(Limit limit) { private double estimateGroupByRowCount(List groupByExpressions, Statistics childStats) { double rowCount = 1; -Map groupByColStats = new HashMap<>(); -for (Expression groupByExpr : groupByExpressions) { -ColumnStatistic colStats = childStats.findColumnStatistics(groupByExpr); -if (colStats == null) { -colStats = ExpressionEstimation.estimate(groupByExpr, childStats); -} -groupByColStats.put(groupByExpr, colStats); +List groupByNdvs = groupByExpressions.stream() +.map(groupByExpr -> { +ColumnStatistic colStats = childStats.findColumnStatistics(groupByExpr); +if (colStats == null) { +colStats = ExpressionEstimation.estimate(groupByExpr, childStats); +} +return colStats.isUnKnown() ? -1 : colStats.ndv; Review Comment: never set ndv to -1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] englefly commented on pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
englefly commented on PR #22000: URL: https://github.com/apache/doris/pull/22000#issuecomment-1646852744 what is the benifit of this refactor? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] 02/08: [bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit 298a58ccc7a0db536c98cca3527460caba09b28e Author: yiguolei <676222...@qq.com> AuthorDate: Sun Jul 23 10:19:20 2023 +0800 [bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117) Co-authored-by: yiguolei If the scanner is failed during init or open, then not need update counters because the query is fail and the counter is useless. And it may core during update counters. For example, update counters depend on scanner's tablet, but the tablet == null when init failed. --- be/src/vec/exec/scan/vscanner.h | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/be/src/vec/exec/scan/vscanner.h b/be/src/vec/exec/scan/vscanner.h index 7103449940..321ee2f0d8 100644 --- a/be/src/vec/exec/scan/vscanner.h +++ b/be/src/vec/exec/scan/vscanner.h @@ -124,7 +124,13 @@ public: bool need_to_close() { return _need_to_close; } void mark_to_need_to_close() { -_update_counters_before_close(); +// If the scanner is failed during init or open, then not need update counters +// because the query is fail and the counter is useless. And it may core during +// update counters. For example, update counters depend on scanner's tablet, but +// the tablet == null when init failed. +if (_is_open) { +_update_counters_before_close(); +} _need_to_close = true; } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] 05/08: [bugfix](runtimefilter) runtime filter is shared between multi instances with same node id, should not cache exprs (#22114)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit e2b79e92d1414f94b81778ad10a0d698a0b04da3 Author: yiguolei <676222...@qq.com> AuthorDate: Sun Jul 23 13:04:33 2023 +0800 [bugfix](runtimefilter) runtime filter is shared between multi instances with same node id, should not cache exprs (#22114) runtime filter is shared among multi instances. in the past, we cached pushdown expr(runtime filter generated) every scannode[runtime filter consumer] will try to call prepare expr but the expr may generated with different fn_context_id - Co-authored-by: yiguolei --- be/src/exprs/runtime_filter.cpp | 28 +--- be/src/exprs/runtime_filter.h | 7 +-- be/src/vec/exec/runtime_filter_consumer.cpp | 6 +++--- 3 files changed, 9 insertions(+), 32 deletions(-) diff --git a/be/src/exprs/runtime_filter.cpp b/be/src/exprs/runtime_filter.cpp index 6932ad9c6b..02f5e7b514 100644 --- a/be/src/exprs/runtime_filter.cpp +++ b/be/src/exprs/runtime_filter.cpp @@ -1152,35 +1152,17 @@ Status IRuntimeFilter::publish() { } } -Status IRuntimeFilter::get_push_expr_ctxs(std::vector* push_exprs) { +Status IRuntimeFilter::get_push_expr_ctxs(std::vector* push_exprs, + bool is_late_arrival) { DCHECK(is_consumer()); -if (!_is_ignored) { -_set_push_down(); -_profile->add_info_string("Info", _format_status()); -return _wrapper->get_push_exprs(push_exprs, _vprobe_ctx); -} else { -_profile->add_info_string("Info", _format_status()); -return Status::OK(); -} -} - -Status IRuntimeFilter::get_prepared_exprs(std::vector* vexprs, - const RowDescriptor& desc, RuntimeState* state) { _profile->add_info_string("Info", _format_status()); if (_is_ignored) { return Status::OK(); } -DCHECK((!_enable_pipeline_exec && _rf_state == RuntimeFilterState::READY) || - (_enable_pipeline_exec && -_rf_state_atomic.load(std::memory_order_acquire) == RuntimeFilterState::READY)); -DCHECK(is_consumer()); -std::lock_guard guard(_inner_mutex); - -if (_push_down_vexprs.empty()) { -RETURN_IF_ERROR(_wrapper->get_push_exprs(&_push_down_vexprs, _vprobe_ctx)); +if (!is_late_arrival) { +_set_push_down(); } -vexprs->insert(vexprs->end(), _push_down_vexprs.begin(), _push_down_vexprs.end()); -return Status::OK(); +return _wrapper->get_push_exprs(push_exprs, _vprobe_ctx); } bool IRuntimeFilter::await() { diff --git a/be/src/exprs/runtime_filter.h b/be/src/exprs/runtime_filter.h index a4fd241a28..fb5e43d177 100644 --- a/be/src/exprs/runtime_filter.h +++ b/be/src/exprs/runtime_filter.h @@ -221,10 +221,7 @@ public: RuntimeFilterType type() const { return _runtime_filter_type; } -Status get_push_expr_ctxs(std::vector* push_exprs); - -Status get_prepared_exprs(std::vector* push_exprs, - const RowDescriptor& desc, RuntimeState* state); +Status get_push_expr_ctxs(std::vector* push_exprs, bool is_late_arrival); bool is_broadcast_join() const { return _is_broadcast_join; } @@ -385,8 +382,6 @@ protected: bool _is_ignored; std::string _ignored_msg; -std::vector _push_down_vexprs; - struct RPCContext; std::shared_ptr _rpc_context; diff --git a/be/src/vec/exec/runtime_filter_consumer.cpp b/be/src/vec/exec/runtime_filter_consumer.cpp index b05ebf0476..2af841749b 100644 --- a/be/src/vec/exec/runtime_filter_consumer.cpp +++ b/be/src/vec/exec/runtime_filter_consumer.cpp @@ -95,7 +95,7 @@ Status RuntimeFilterConsumer::_acquire_runtime_filter() { ready = runtime_filter->await(); } if (ready && !_runtime_filter_ctxs[i].apply_mark) { -RETURN_IF_ERROR(runtime_filter->get_push_expr_ctxs(&vexprs)); +RETURN_IF_ERROR(runtime_filter->get_push_expr_ctxs(&vexprs, false)); _runtime_filter_ctxs[i].apply_mark = true; } else if (runtime_filter->current_state() == RuntimeFilterState::NOT_READY && !_runtime_filter_ctxs[i].apply_mark) { @@ -151,8 +151,8 @@ Status RuntimeFilterConsumer::try_append_late_arrival_runtime_filter(int* arrive ++current_arrived_rf_num; continue; } else if (_runtime_filter_ctxs[i].runtime_filter->is_ready()) { - RETURN_IF_ERROR(_runtime_filter_ctxs[i].runtime_filter->get_prepared_exprs( -&exprs, _row_descriptor_ref, _state)); +RETURN_IF_ERROR( + _runtime_filter_ctxs[i].runtime_filter->get_push_expr_ctxs(&exprs, true)); ++current_arrived_rf_num; _runtime_filter_ctxs[i].apply_mark = true; } --
[doris] 01/08: [fix](metric) fix prometheus metric format error (#22045)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit caf7c6bfd71363c7b0ab69dd38d0fe677a4bf1c4 Author: caiconghui <55968745+caicong...@users.noreply.github.com> AuthorDate: Sat Jul 22 22:38:29 2023 +0800 [fix](metric) fix prometheus metric format error (#22045) we should define metric name only once like following: # HELP doris_fe_query_latency_ms # TYPE doris_fe_query_latency_ms summary doris_fe_query_latency_ms{quantile="0.75"} 1.0 doris_fe_query_latency_ms{quantile="0.95"} 2.0 doris_fe_query_latency_ms{quantile="0.98"} 100.0 doris_fe_query_latency_ms{quantile="0.99"} 100.0 doris_fe_query_latency_ms{quantile="0.999"} 100.0 doris_fe_query_latency_ms{quantile="0.75",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.95",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.98",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.99",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.999",user="default_cluster:test1"} 1.0 --- .../src/main/java/org/apache/doris/metric/MetricRepo.java| 6 +++--- .../org/apache/doris/metric/PrometheusMetricVisitor.java | 12 +--- .../src/main/java/org/apache/doris/qe/ConnectProcessor.java | 3 ++- .../src/test/java/org/apache/doris/metric/MetricsTest.java | 6 +++--- 4 files changed, 13 insertions(+), 14 deletions(-) diff --git a/fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java b/fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java index 51f6902786..1d6b7d1810 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java +++ b/fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java @@ -72,7 +72,7 @@ public final class MetricRepo { public static LongCounterMetric COUNTER_QUERY_TABLE; public static LongCounterMetric COUNTER_QUERY_OLAP_TABLE; public static Histogram HISTO_QUERY_LATENCY; -public static AutoMappedMetric DB_HISTO_QUERY_LATENCY; +public static AutoMappedMetric USER_HISTO_QUERY_LATENCY; public static AutoMappedMetric> USER_GAUGE_QUERY_INSTANCE_NUM; public static AutoMappedMetric USER_COUNTER_QUERY_INSTANCE_BEGIN; public static AutoMappedMetric BE_COUNTER_QUERY_RPC_ALL; @@ -287,8 +287,8 @@ public final class MetricRepo { DORIS_METRIC_REGISTER.addMetrics(COUNTER_QUERY_OLAP_TABLE); HISTO_QUERY_LATENCY = METRIC_REGISTER.histogram( MetricRegistry.name("query", "latency", "ms")); -DB_HISTO_QUERY_LATENCY = new AutoMappedMetric<>(name -> { -String metricName = MetricRegistry.name("query", "latency", "ms", "db=" + name); +USER_HISTO_QUERY_LATENCY = new AutoMappedMetric<>(name -> { +String metricName = MetricRegistry.name("query", "latency", "ms", "user=" + name); return METRIC_REGISTER.histogram(metricName); }); USER_COUNTER_QUERY_INSTANCE_BEGIN = addLabeledMetrics("user", () -> diff --git a/fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java b/fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java index 20983a4920..fccf3317ae 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java +++ b/fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java @@ -191,8 +191,11 @@ public class PrometheusMetricVisitor extends MetricVisitor { } final String fullName = prefix + String.join("_", names); final String fullTag = String.join(",", tags); -sb.append(HELP).append(fullName).append(" ").append("\n"); -sb.append(TYPE).append(fullName).append(" ").append("summary\n"); +// we should define metric name only once +if (tags.isEmpty()) { +sb.append(HELP).append(fullName).append(" ").append("\n"); +sb.append(TYPE).append(fullName).append(" ").append("summary\n"); +} String delimiter = tags.isEmpty() ? "" : ","; Snapshot snapshot = histogram.getSnapshot(); sb.append(fullName).append("{quantile=\"0.75\"").append(delimiter).append(fullTag).append("} ") @@ -205,11 +208,6 @@ public class PrometheusMetricVisitor extends MetricVisitor { .append(snapshot.get99thPercentile()).append("\n"); sb.append(fullName).append("{quantile=\"0.999\"").append(delimiter).append(fullTag).append("} ") .append(snapshot.get999thPercentile()).append("\n"); -sb.append(fullName).append("_sum {").append(fullTag).append("} ") -.append(histogram.getCount() * snapshot.getMean()).append("\n"); -sb.append(fullName).append("_count {").append(fullTag).append("} ") -.append(histogram.getCount()).append("\n"); -return; } @Overr
[doris] 08/08: [deps](hadoop) update hadoop libs to 3.3.4.5 (#22062)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit b0b755855f974f33e92edb5144c87ea9d7c63275 Author: Mingyu Chen AuthorDate: Sun Jul 23 20:17:16 2023 +0800 [deps](hadoop) update hadoop libs to 3.3.4.5 (#22062) --- thirdparty/CHANGELOG.md | 4 thirdparty/vars.sh | 8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/thirdparty/CHANGELOG.md b/thirdparty/CHANGELOG.md index abc7c83e76..c90363fcb7 100644 --- a/thirdparty/CHANGELOG.md +++ b/thirdparty/CHANGELOG.md @@ -2,6 +2,10 @@ This file contains version of the third-party dependency libraries in the build-env image. The docker build-env image is apache/doris, and the tag is `build-env-${version}` +## v20230721 + +- Modified hadoop libhdfs 3.3.4.4 -> 3.3.4.5 + ## v20230625 - Modified benchmark 1.5.6 -> 1.8.0 diff --git a/thirdparty/vars.sh b/thirdparty/vars.sh index f10c56fa9e..435434927f 100644 --- a/thirdparty/vars.sh +++ b/thirdparty/vars.sh @@ -459,10 +459,10 @@ FAST_FLOAT_SOURCE=fast_float-3.9.0 FAST_FLOAT_MD5SUM="5656b0d8b150a3b157cfb092d214f6ea" # libhdfs -HADOOP_LIBS_DOWNLOAD="https://github.com/apache/doris-thirdparty/archive/refs/tags/hadoop-3.3.4.4-for-doris.tar.gz"; -HADOOP_LIBS_NAME="hadoop-3.3.4.4-for-doris.tar.gz" -HADOOP_LIBS_SOURCE="doris-thirdparty-hadoop-3.3.4.4-for-doris" -HADOOP_LIBS_MD5SUM="00f0042dd3900ba016f079ee9c550efb" +HADOOP_LIBS_DOWNLOAD="https://github.com/apache/doris-thirdparty/archive/refs/tags/hadoop-3.3.4.5-for-doris.tar.gz"; +HADOOP_LIBS_NAME="hadoop-3.3.4.5-for-doris.tar.gz" +HADOOP_LIBS_SOURCE="doris-thirdparty-hadoop-3.3.4.5-for-doris" +HADOOP_LIBS_MD5SUM="15b7be1747b27c37923b0cb9db6cff8c" # all thirdparties which need to be downloaded is set in array TP_ARCHIVES export TP_ARCHIVES=( - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] 03/08: [fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit 04d9ddbe264507da2a77f7560e0498595317431b Author: zhangdong <493738...@qq.com> AuthorDate: Sun Jul 23 11:24:20 2023 +0800 [fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941) Supplement to #21104 --- .../org/apache/doris/catalog/external/ExternalDatabase.java | 6 +- .../apache/doris/catalog/external/HMSExternalDatabase.java | 13 - .../doris/catalog/external/IcebergExternalDatabase.java | 5 ++--- .../doris/catalog/external/PaimonExternalDatabase.java | 5 ++--- .../main/java/org/apache/doris/datasource/CatalogMgr.java | 8 .../java/org/apache/doris/datasource/ExternalCatalog.java | 4 ++-- .../org/apache/doris/datasource/HMSExternalCatalog.java | 6 ++ 7 files changed, 29 insertions(+), 18 deletions(-) diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/external/ExternalDatabase.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/external/ExternalDatabase.java index 0a82d37ff3..fa2ecd4011 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/catalog/external/ExternalDatabase.java +++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/external/ExternalDatabase.java @@ -347,13 +347,17 @@ public abstract class ExternalDatabase throw new NotImplementedException("dropTable() is not implemented"); } +public void dropTableForReplay(String tableName) { +throw new NotImplementedException("replayDropTableFromEvent() is not implemented"); +} + @Override public CatalogIf getCatalog() { return extCatalog; } // Only used for sync hive metastore event -public void replayCreateTableFromEvent(String tableName, long tableId) { +public void createTableForReplay(String tableName, long tableId) { throw new NotImplementedException("createTable() is not implemented"); } } diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalDatabase.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalDatabase.java index 093ebe8b40..d75f86bd08 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalDatabase.java +++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalDatabase.java @@ -74,7 +74,18 @@ public class HMSExternalDatabase extends ExternalDatabase { } @Override -public void replayCreateTableFromEvent(String tableName, long tableId) { +public void dropTableForReplay(String tableName) { +LOG.debug("replayDropTableFromEvent [{}]", tableName); +Long tableId = tableNameToId.remove(tableName); +if (tableId == null) { +LOG.warn("replayDropTableFromEvent [{}] failed", tableName); +return; +} +idToTbl.remove(tableId); +} + +@Override +public void createTableForReplay(String tableName, long tableId) { LOG.debug("create table [{}]", tableName); tableNameToId.put(tableName, tableId); HMSExternalTable table = getExternalTable(tableName, tableId, extCatalog); diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/external/IcebergExternalDatabase.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/external/IcebergExternalDatabase.java index 8653c3e2dd..a915b3b241 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/catalog/external/IcebergExternalDatabase.java +++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/external/IcebergExternalDatabase.java @@ -49,9 +49,8 @@ public class IcebergExternalDatabase extends ExternalDatabase db = getDbForInit(dbName, dbId, logType); - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] 04/08: [doc](catalog)paimon doc (#21966)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit 398ad188ac529355e25bbc4b1f4f15e53e86eea8 Author: zhangdong <493738...@qq.com> AuthorDate: Sun Jul 23 11:24:40 2023 +0800 [doc](catalog)paimon doc (#21966) code pr: #21910 --- docs/en/docs/lakehouse/multi-catalog/paimon.md| 75 ++- docs/zh-CN/docs/lakehouse/multi-catalog/paimon.md | 75 ++- 2 files changed, 120 insertions(+), 30 deletions(-) diff --git a/docs/en/docs/lakehouse/multi-catalog/paimon.md b/docs/en/docs/lakehouse/multi-catalog/paimon.md index cd9253288f..79e5b76681 100644 --- a/docs/en/docs/lakehouse/multi-catalog/paimon.md +++ b/docs/en/docs/lakehouse/multi-catalog/paimon.md @@ -30,31 +30,76 @@ under the License. -## Usage +## Instructions for use -1. Currently, Doris only supports simple field types. -2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions. +1. When data in hdfs,need to put core-site.xml, hdfs-site.xml and hive-site.xml in the conf directory of FE and BE. First read the hadoop configuration file in the conf directory, and then read the related to the environment variable `HADOOP_CONF_DIR` configuration file. +2. The currently adapted version of the payment is 0.4.0 ## Create Catalog -### Create Catalog Based on Paimon API +Paimon Catalog Currently supports two types of Metastore creation catalogs: +* filesystem(default),Store both metadata and data in the file system. +* hive metastore,It also stores metadata in Hive metastore. Users can access these tables directly from Hive. -Use the Paimon API to access metadata.Currently, only support Hive service as Paimon's Catalog. +### Creating a Catalog Based on FileSystem -- Hive Metastore + HDFS +```sql +CREATE CATALOG `paimon_hdfs` PROPERTIES ( +"type" = "paimon", +"warehouse" = "hdfs://HDFS8000871/user/paimon", +"dfs.nameservices"="HDFS8000871", +"dfs.ha.namenodes.HDFS8000871"="nn1,nn2", +"dfs.namenode.rpc-address.HDFS8000871.nn1"="172.21.0.1:4007", +"dfs.namenode.rpc-address.HDFS8000871.nn2"="172.21.0.2:4007", + "dfs.client.failover.proxy.provider.HDFS8000871"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", +"hadoop.username"="hadoop" +); + +``` + + S3 ```sql -CREATE CATALOG `paimon` PROPERTIES ( +CREATE CATALOG `paimon_s3` PROPERTIES ( "type" = "paimon", -"hive.metastore.uris" = "thrift://172.16.65.15:7004", -"dfs.ha.namenodes.HDFS1006531" = "nn2,nn1", -"dfs.namenode.rpc-address.HDFS1006531.nn2" = "172.16.65.115:4007", -"dfs.namenode.rpc-address.HDFS1006531.nn1" = "172.16.65.15:4007", -"dfs.nameservices" = "HDFS1006531", -"hadoop.username" = "hadoop", -"warehouse" = "hdfs://HDFS1006531/data/paimon", -"dfs.client.failover.proxy.provider.HDFS1006531" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" +"warehouse" = "s3://paimon-1308700295.cos.ap-beijing.myqcloud.com/paimoncos", +"s3.endpoint"="cos.ap-beijing.myqcloud.com", +"s3.access_key"="ak", +"s3.secret_key"="sk" ); + +``` + + OSS + +```sql +CREATE CATALOG `paimon_oss` PROPERTIES ( +"type" = "paimon", +"warehouse" = "oss://paimon-zd/paimonoss", +"oss.endpoint"="oss-cn-beijing.aliyuncs.com", +"oss.access_key"="ak", +"oss.secret_key"="sk" +); + +``` + +### Creating a Catalog Based on Hive Metastore + +```sql +CREATE CATALOG `paimon_hms` PROPERTIES ( +"type" = "paimon", +"paimon.catalog.type"="hms", +"warehouse" = "hdfs://HDFS8000871/user/zhangdong/paimon2", +"hive.metastore.uris" = "thrift://172.21.0.44:7004", +"dfs.nameservices'='HDFS8000871", +"dfs.ha.namenodes.HDFS8000871'='nn1,nn2", +"dfs.namenode.rpc-address.HDFS8000871.nn1"="172.21.0.1:4007", +"dfs.namenode.rpc-address.HDFS8000871.nn2"="172.21.0.2:4007", + "dfs.client.failover.proxy.provider.HDFS8000871"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", +"hadoop.username"="hadoop" +); + ``` ## Column Type Mapping diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/paimon.md b/docs/zh-CN/docs/lakehouse/multi-catalog/paimon.md index 0ed5a12caa..7a14c879ae 100644 --- a/docs/zh-CN/docs/lakehouse/multi-catalog/paimon.md +++ b/docs/zh-CN/docs/lakehouse/multi-catalog/paimon.md @@ -30,31 +30,76 @@ under the License. -## 使用限制 +## 使用须知 -1. 目前只支持简单字段类型。 -2. 目前仅支持 Hive Metastore 类型的 Catalog。所以使用方式和 Hive Catalog 基本一致。后续版本将支持其他类型的 Catalog。 +1. 数据放在hdfs时,需要将 core-site.xml,hdfs-site.xml 和 hive-site.xml 放到 FE 和 BE 的 conf 目录下。优先读取 conf 目录下的 hadoop 配置文件,再读取环境变量 `HADOOP_CONF_DIR` 的相关配置文件。 +2. 当前适配的paimon版本为0.4.0 ## 创建 Catalog -### 基于Paimon API创建Catalog +Paimon Catalog 当前支持两种类型的Metastore创建Catalog: +* fi
[doris] branch branch-2.0 updated (a852ff36f4 -> b0b755855f)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a change to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git from a852ff36f4 remove create hot partition failed check (#22093) new caf7c6bfd7 [fix](metric) fix prometheus metric format error (#22045) new 298a58ccc7 [bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117) new 04d9ddbe26 [fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941) new 398ad188ac [doc](catalog)paimon doc (#21966) new e2b79e92d1 [bugfix](runtimefilter) runtime filter is shared between multi instances with same node id, should not cache exprs (#22114) new 7315d01dc1 [improvement](Jsonb) optimization Jsonb path parse (#21495) new 554dfe5b00 [minor](log) print error msg to fe.out before log is initialized (#22106) new b0b755855f [deps](hadoop) update hadoop libs to 3.3.4.5 (#22062) The 8 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/exprs/runtime_filter.cpp| 28 +-- be/src/exprs/runtime_filter.h | 7 +- be/src/util/jsonb_document.h | 273 ++--- be/src/vec/exec/runtime_filter_consumer.cpp| 6 +- be/src/vec/exec/scan/vscanner.h| 8 +- be/src/vec/functions/function_jsonb.cpp| 101 ++-- docs/en/docs/lakehouse/multi-catalog/paimon.md | 75 -- docs/zh-CN/docs/lakehouse/multi-catalog/paimon.md | 75 -- .../src/main/java/org/apache/doris/DorisFE.java| 3 + .../doris/catalog/external/ExternalDatabase.java | 6 +- .../catalog/external/HMSExternalDatabase.java | 13 +- .../catalog/external/IcebergExternalDatabase.java | 5 +- .../catalog/external/PaimonExternalDatabase.java | 5 +- .../org/apache/doris/datasource/CatalogMgr.java| 8 +- .../apache/doris/datasource/ExternalCatalog.java | 4 +- .../doris/datasource/HMSExternalCatalog.java | 6 +- .../java/org/apache/doris/metric/MetricRepo.java | 6 +- .../doris/metric/PrometheusMetricVisitor.java | 12 +- .../java/org/apache/doris/qe/ConnectProcessor.java | 3 +- .../java/org/apache/doris/metric/MetricsTest.java | 6 +- thirdparty/CHANGELOG.md| 4 + thirdparty/vars.sh | 8 +- 22 files changed, 402 insertions(+), 260 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] 06/08: [improvement](Jsonb) optimization Jsonb path parse (#21495)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit 7315d01dc1115ba69f09b3c974e8746fdc237e7a Author: Liqf <109049295+lemonlit...@users.noreply.github.com> AuthorDate: Sun Jul 23 18:59:12 2023 +0800 [improvement](Jsonb) optimization Jsonb path parse (#21495) The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath --- be/src/util/jsonb_document.h| 273 be/src/vec/functions/function_jsonb.cpp | 101 +--- 2 files changed, 213 insertions(+), 161 deletions(-) diff --git a/be/src/util/jsonb_document.h b/be/src/util/jsonb_document.h index d54e7291dd..c21917e066 100644 --- a/be/src/util/jsonb_document.h +++ b/be/src/util/jsonb_document.h @@ -309,14 +309,6 @@ public: bool get_has_escapes() const { return has_escapes; } -void set_is_invalid_json_path(bool has) { is_invalid_json_path = has; } - -bool get_is_invalid_json_path() const { return is_invalid_json_path; } - -void set_type(unsigned int code) { type = code; } - -bool get_type() const { return type; } - private: /// The current position in the stream. const char* m_position; @@ -332,9 +324,17 @@ private: ///Whether to contain escape characters bool has_escapes = false; +}; + +struct leg_info { +///path leg ptr +char* leg_ptr; + +///path leg len +unsigned int leg_len; -///Is the json path valid -bool is_invalid_json_path = false; +///array_index +int array_index; ///type: 0 is member 1 is array unsigned int type; @@ -343,10 +343,24 @@ private: class JsonbPath { public: // parse json path -static bool parsePath(Stream* stream); +static bool parsePath(Stream* stream, JsonbPath* path); + +static bool parse_array(Stream* stream, JsonbPath* path); +static bool parse_member(Stream* stream, JsonbPath* path); + +//return true if json path valid else return false +bool seek(const char* string, size_t length); + +void add_leg_to_leg_vector(std::unique_ptr leg) { +leg_vector.emplace_back(leg.release()); +} -static bool parse_array(Stream* stream); -static bool parse_member(Stream* stream); +size_t get_leg_vector_size() { return leg_vector.size(); } + +leg_info* get_leg_from_leg_vector(size_t i) { return leg_vector[i].get(); } + +private: +std::vector> leg_vector; }; /* @@ -529,15 +543,8 @@ public: // get the raw byte array of the value const char* getValuePtr() const; -// find the JSONB value by a key path string (null terminated) -JsonbValue* findPath(const char* key_path, bool& is_invalid_json_path, - hDictFind handler = nullptr) { -return findPath(key_path, (unsigned int)strlen(key_path), is_invalid_json_path, handler); -} - -// find the JSONB value by a key path string (with length) -JsonbValue* findPath(const char* key_path, unsigned int len, bool& is_invalid_json_path, - hDictFind handler); +// find the JSONB value by JsonbPath +JsonbValue* findValue(JsonbPath& path, hDictFind handler); friend class JsonbDocument; protected: @@ -1207,154 +1214,100 @@ inline const char* JsonbValue::getValuePtr() const { } } -inline JsonbValue* JsonbValue::findPath(const char* key_path, unsigned int kp_len, -bool& is_invalid_json_path, hDictFind handler = nullptr) { -if (!key_path) return nullptr; -if (kp_len == 0) { -is_invalid_json_path = true; -return nullptr; -} +inline bool JsonbPath::seek(const char* key_path, size_t kp_len) { +//path invalid +if (!key_path || kp_len == 0) return false; Stream stream(key_path, kp_len); stream.skip_whitespace(); if (stream.exhausted() || stream.read() != SCOPE) { -is_invalid_json_path = true; -return nullptr; +//path invalid +return false; } -JsonbValue* pval = this; - -while (pval && !stream.exhausted()) { +while (!stream.exhausted()) { stream.skip_whitespace(); stream.clear_leg_ptr(); stream.clear_leg_len(); -if (!JsonbPath::parsePath(&stream)) { -is_invalid_json_path = stream.get_is_invalid_json_path(); -return nullptr; -} - -if (stream.get_leg_len() == 0) { -return nullptr; +if (!JsonbPath::parsePath(&stream, this)) { +//path invalid +return false; } +} +return true; +} -if (stream.get_type() == MEMBER_CODE) { +inline JsonbValue* JsonbValue::findValue(JsonbPath& path, hDictFind handler) { +JsonbValue* pval = this; +for (size
[doris] 07/08: [minor](log) print error msg to fe.out before log is initialized (#22106)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git commit 554dfe5b0086a6ceddc5bbdfa35d7530af0a14a5 Author: Mingyu Chen AuthorDate: Sun Jul 23 19:20:10 2023 +0800 [minor](log) print error msg to fe.out before log is initialized (#22106) The exception may be thrown before LOG is initialized. Such as wrong config value. So we need to print it to fe.out, otherwise we can't know what's wrong. After this PR, the error can be found in fe.out, such as: ``` java.lang.NumberFormatException: For input string: "3g" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.doris.common.ConfigBase.setConfigField(ConfigBase.java:253) at org.apache.doris.common.ConfigBase.setFields(ConfigBase.java:232) at org.apache.doris.common.ConfigBase.initConf(ConfigBase.java:146) at org.apache.doris.common.ConfigBase.init(ConfigBase.java:112) at org.apache.doris.DorisFE.start(DorisFE.java:101) at org.apache.doris.DorisFE.main(DorisFE.java:73) ``` --- fe/fe-core/src/main/java/org/apache/doris/DorisFE.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java b/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java index 7d87091120..07394d9cd4 100755 --- a/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java +++ b/fe/fe-core/src/main/java/org/apache/doris/DorisFE.java @@ -187,6 +187,9 @@ public class DorisFE { Thread.sleep(2000); } } catch (Throwable e) { +// Some exception may thrown before LOG is inited. +// So need to print to stdout +e.printStackTrace(); LOG.warn("", e); } } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] Gabriel39 commented on pull request #22028: [pipeline](refactor) refactor pipeline task schedule logics
Gabriel39 commented on PR #22028: URL: https://github.com/apache/doris/pull/22028#issuecomment-1646853189 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] englefly commented on pull request #22064: [stats](nereids)keep min/max expr in colstats
englefly commented on PR #22064: URL: https://github.com/apache/doris/pull/22064#issuecomment-1646853272 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] xiaokang merged pull request #22124: [feature](create-table) support setting replication num for creating table operation globally
xiaokang merged PR #22124: URL: https://github.com/apache/doris/pull/22124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch branch-2.0 updated: [feature](create-table) support setting replication num for creating table opertaion globally (#21848) (#22124)
This is an automated email from the ASF dual-hosted git repository. kxiao pushed a commit to branch branch-2.0 in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/branch-2.0 by this push: new 4c303f2e1f [feature](create-table) support setting replication num for creating table opertaion globally (#21848) (#22124) 4c303f2e1f is described below commit 4c303f2e1f9f54ade8c6e7a690e48bf9540b5670 Author: Mingyu Chen AuthorDate: Sun Jul 23 22:32:07 2023 +0800 [feature](create-table) support setting replication num for creating table opertaion globally (#21848) (#22124) --- .../main/java/org/apache/doris/common/Config.java | 12 + .../org/apache/doris/analysis/CreateTableStmt.java | 31 ++ .../org/apache/doris/catalog/CreateTableTest.java | 30 + 3 files changed, 73 insertions(+) diff --git a/fe/fe-common/src/main/java/org/apache/doris/common/Config.java b/fe/fe-common/src/main/java/org/apache/doris/common/Config.java index dbf73be1ba..e49f6c2dd0 100644 --- a/fe/fe-common/src/main/java/org/apache/doris/common/Config.java +++ b/fe/fe-common/src/main/java/org/apache/doris/common/Config.java @@ -2027,4 +2027,16 @@ public class Config extends ConfigBase { "Hive行数估算分区采样数", "Sample size for hive row count estimation."}) public static int hive_stats_partition_sample_size = 3000; + +@ConfField(mutable = true, masterOnly = true, description = { +"用于强制设定内表的副本数,如果改参数大于零,则用户在建表时指定的副本数将被忽略,而使用本参数设置的值。" ++ "同时,建表语句中指定的副本标签等参数会被忽略。该参数不影响包括创建分区、修改表属性的操作。该参数建议仅用于测试环境", +"Used to force the number of replicas of the internal table. If the config is greater than zero, " ++ "the number of replicas specified by the user when creating the table will be ignored, " ++ "and the value set by this parameter will be used. At the same time, the replica tags " ++ "and other parameters specified in the create table statement will be ignored. " ++ "This config does not effect the operations including creating partitions " ++ "and modifying table properties. " ++ "This config is recommended to be used only in the test environment"}) +public static int force_olap_table_replication_num = 0; } diff --git a/fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java b/fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java index c332d6e34a..64409d3b4a 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java +++ b/fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java @@ -25,6 +25,7 @@ import org.apache.doris.catalog.Env; import org.apache.doris.catalog.Index; import org.apache.doris.catalog.KeysType; import org.apache.doris.catalog.PrimitiveType; +import org.apache.doris.catalog.ReplicaAllocation; import org.apache.doris.catalog.Type; import org.apache.doris.common.AnalysisException; import org.apache.doris.common.Config; @@ -529,6 +530,8 @@ public class CreateTableStmt extends DdlStmt { } if (engineName.equals("olap")) { +// before analyzing partition, handle the replication allocation info +properties = rewriteReplicaAllocationProperties(properties); // analyze partition if (partitionDesc != null) { if (partitionDesc instanceof ListPartitionDesc || partitionDesc instanceof RangePartitionDesc @@ -619,6 +622,34 @@ public class CreateTableStmt extends DdlStmt { } } +private Map rewriteReplicaAllocationProperties(Map properties) { +if (Config.force_olap_table_replication_num <= 0) { +return properties; +} +// if force_olap_table_replication_num is set, use this value to rewrite the replication_num or +// replication_allocation properties +Map newProperties = properties; +if (newProperties == null) { +newProperties = Maps.newHashMap(); +} +boolean rewrite = false; +if (newProperties.containsKey(PropertyAnalyzer.PROPERTIES_REPLICATION_NUM)) { +newProperties.put(PropertyAnalyzer.PROPERTIES_REPLICATION_NUM, +String.valueOf(Config.force_olap_table_replication_num)); +rewrite = true; +} +if (newProperties.containsKey(PropertyAnalyzer.PROPERTIES_REPLICATION_ALLOCATION)) { + newProperties.put(PropertyAnalyzer.PROPERTIES_REPLICATION_ALLOCATION, +new ReplicaAllocation((short) Config.force_olap_table_replication_num).toCreateStmt()); +rewrite = true; +} +if (!rewrite) { +newProperties.put(PropertyAnalyzer.PROPERTIES_REPLICATION_NUM, +String.valueOf(Config.force_olap_table_replication_num)); +} +
[GitHub] [doris] github-actions[bot] commented on pull request #22028: [pipeline](refactor) refactor pipeline task schedule logics
github-actions[bot] commented on PR #22028: URL: https://github.com/apache/doris/pull/22028#issuecomment-1646857191 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22123: [fix](Nereids): mergeGroup should merge target Group into existed Group
hello-stephen commented on PR #22123: URL: https://github.com/apache/doris/pull/22123#issuecomment-1646860725 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.06 seconds stream load tsv: 506 seconds loaded 74807831229 Bytes, about 140 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.0 seconds inserted 1000 Rows, about 344K ops/s storage size: 17160679975 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] caiconghui opened a new pull request, #22125: [feature](metric) Support collect query counter and error query counter metric in user level
caiconghui opened a new pull request, #22125: URL: https://github.com/apache/doris/pull/22125 …er metric in user level ## Proposed changes Issue Number: close #xxx 1. support collect query counter and error query counter metric in user level 2. fix bug that metrics names are not in sorted order when collect ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] caiconghui commented on pull request #22125: [feature](metric) Support collect query counter and error query counter metric in user level
caiconghui commented on PR #22125: URL: https://github.com/apache/doris/pull/22125#issuecomment-1646863478 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22064: [stats](nereids)keep min/max expr in colstats
hello-stephen commented on PR #22064: URL: https://github.com/apache/doris/pull/22064#issuecomment-1646865667 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.46 seconds stream load tsv: 512 seconds loaded 74807831229 Bytes, about 139 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s insert into select: 29.3 seconds inserted 1000 Rows, about 341K ops/s storage size: 17162755093 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] Kikyou1997 commented on pull request #22070: [enhancement](nereids) Update stats table config
Kikyou1997 commented on PR #22070: URL: https://github.com/apache/doris/pull/22070#issuecomment-1646866285 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] zclllyybb commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
zclllyybb commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646867178 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
github-actions[bot] commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646868466 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] zddr commented on a diff in pull request #21968: [improvement](catalog)optimize ldap
zddr commented on code in PR #21968: URL: https://github.com/apache/doris/pull/21968#discussion_r1271468935 ## fe/fe-core/src/main/java/org/apache/doris/ldap/LdapManager.java: ## @@ -200,24 +207,22 @@ private LdapUserInfo getUserInfoFromCache(String fullName) { * Step2: get roles by ldap groups; * Step3: merge the roles; */ -private Role getLdapGroupsPrivs(String userName, String clusterName) throws DdlException { +private Set getLdapGroupsRoles(String userName, String clusterName) throws DdlException { //get user ldap group. the ldap group name should be the same as the doris role name List ldapGroups = ldapClient.getGroups(userName); -List rolesNames = Lists.newArrayList(); +Set roles = Sets.newHashSet(); for (String group : ldapGroups) { String qualifiedRole = ClusterNamespace.getFullName(clusterName, group); if (Env.getCurrentEnv().getAuth().doesRoleExist(qualifiedRole)) { -rolesNames.add(qualifiedRole); + roles.add(Env.getCurrentEnv().getAuth().getRoleByName(qualifiedRole)); } } -LOG.debug("get user:{} ldap groups:{} and doris roles:{}", userName, ldapGroups, rolesNames); +LOG.debug("get user:{} ldap groups:{} and doris roles:{}", userName, ldapGroups, roles); Role ldapGroupsPrivs = new Role(LDAP_GROUPS_PRIVS_NAME); Review Comment: if create a default role in RoleManager,user will can grant priv to this role,that will change auth of all ldap user.so I change the name if LDAP_GROUPS_PRIVS_NAME to ldapDefaultRole only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] morningman commented on pull request #22078: [improvement](iceberg) Optimize the split to the user-specified size
morningman commented on PR #22078: URL: https://github.com/apache/doris/pull/22078#issuecomment-1646869337 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22052: [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue.
github-actions[bot] commented on PR #22052: URL: https://github.com/apache/doris/pull/22052#issuecomment-1646870493 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22052: [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue.
github-actions[bot] commented on PR #22052: URL: https://github.com/apache/doris/pull/22052#issuecomment-1646870503 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22028: [pipeline](refactor) refactor pipeline task schedule logics
hello-stephen commented on PR #22028: URL: https://github.com/apache/doris/pull/22028#issuecomment-1646872187 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 46.17 seconds stream load tsv: 510 seconds loaded 74807831229 Bytes, about 139 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.9 seconds inserted 1000 Rows, about 334K ops/s storage size: 17162497475 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] jackwener commented on pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
jackwener commented on PR #22000: URL: https://github.com/apache/doris/pull/22000#issuecomment-1646872304 > what is the benifit of this refactor? avoid to generate useless HashMap. It will cost some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22070: [enhancement](nereids) Update stats table config
hello-stephen commented on PR #22070: URL: https://github.com/apache/doris/pull/22070#issuecomment-1646873242 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.49 seconds stream load tsv: 504 seconds loaded 74807831229 Bytes, about 141 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.2 seconds inserted 1000 Rows, about 342K ops/s storage size: 17162646680 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] zddr commented on pull request #21968: [improvement](catalog)optimize ldap
zddr commented on PR #21968: URL: https://github.com/apache/doris/pull/21968#issuecomment-1646875011 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] HappenLee commented on a diff in pull request #22059: [vectorized](udf) java udf support map type
HappenLee commented on code in PR #22059: URL: https://github.com/apache/doris/pull/22059#discussion_r1271473105 ## fe/be-java-extensions/java-udf/src/main/java/org/apache/doris/udf/UdfExecutor.java: ## @@ -503,4 +573,78 @@ protected void init(TJavaUdfExecutorCtorParams request, String jarPath, Type fun throw new UdfRuntimeException("Unable to call create UDF instance.", e); } } + +public static class HashMapBuilder { +public Object[] get(Object[] keyCol, Object[] valueCol, PrimitiveType valueType) { +switch (valueType) { +case BOOLEAN: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case TINYINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case SMALLINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case INT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case BIGINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case LARGEINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case FLOAT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DOUBLE: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case CHAR: +case VARCHAR: +case STRING: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DATEV2: +case DATE: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DATETIMEV2: +case DATETIME: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DECIMAL32: +case DECIMAL64: +case DECIMALV2: +case DECIMAL128: { +return new BuildMapFromType().get(keyCol, valueCol); +} +default: { +LOG.info("Not support: " + valueType); +Preconditions.checkState(false, "Not support type " + valueType.toString()); +break; +} +} +return null; +} +} + +public static class BuildMapFromType { +public Object[] get(Object[] keyCol, Object[] valueCol) { +Object[] retHashMap = new HashMap[keyCol.length]; +for (int colIdx = 0; colIdx < keyCol.length; colIdx++) { +HashMap hashMap = new HashMap<>(); +ArrayList keys = (ArrayList) (keyCol[colIdx]); +ArrayList values = (ArrayList) (valueCol[colIdx]); +for (int i = 0; i < keys.size(); i++) { +T1 key = keys.get(i); +T2 value = values.get(i); +hashMap.put(key, value); Review Comment: keep the same as doris map if contain the key, just skip the put call continue of call put_if_not_exist ## fe/be-java-extensions/java-udf/src/main/java/org/apache/doris/udf/UdfExecutor.java: ## @@ -503,4 +573,78 @@ protected void init(TJavaUdfExecutorCtorParams request, String jarPath, Type fun throw new UdfRuntimeException("Unable to call create UDF instance.", e); } } + +public static class HashMapBuilder { +public Object[] get(Object[] keyCol, Object[] valueCol, PrimitiveType valueType) { +switch (valueType) { +case BOOLEAN: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case TINYINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case SMALLINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case INT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case BIGINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case LARGEINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case FLOAT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DOUBLE: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case CHAR: +case VARCHAR: +case STRING: { +return new Buil
[GitHub] [doris] HappenLee commented on a diff in pull request #22059: [vectorized](udf) java udf support map type
HappenLee commented on code in PR #22059: URL: https://github.com/apache/doris/pull/22059#discussion_r1271473105 ## fe/be-java-extensions/java-udf/src/main/java/org/apache/doris/udf/UdfExecutor.java: ## @@ -503,4 +573,78 @@ protected void init(TJavaUdfExecutorCtorParams request, String jarPath, Type fun throw new UdfRuntimeException("Unable to call create UDF instance.", e); } } + +public static class HashMapBuilder { +public Object[] get(Object[] keyCol, Object[] valueCol, PrimitiveType valueType) { +switch (valueType) { +case BOOLEAN: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case TINYINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case SMALLINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case INT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case BIGINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case LARGEINT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case FLOAT: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DOUBLE: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case CHAR: +case VARCHAR: +case STRING: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DATEV2: +case DATE: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DATETIMEV2: +case DATETIME: { +return new BuildMapFromType().get(keyCol, valueCol); +} +case DECIMAL32: +case DECIMAL64: +case DECIMALV2: +case DECIMAL128: { +return new BuildMapFromType().get(keyCol, valueCol); +} +default: { +LOG.info("Not support: " + valueType); +Preconditions.checkState(false, "Not support type " + valueType.toString()); +break; +} +} +return null; +} +} + +public static class BuildMapFromType { +public Object[] get(Object[] keyCol, Object[] valueCol) { +Object[] retHashMap = new HashMap[keyCol.length]; +for (int colIdx = 0; colIdx < keyCol.length; colIdx++) { +HashMap hashMap = new HashMap<>(); +ArrayList keys = (ArrayList) (keyCol[colIdx]); +ArrayList values = (ArrayList) (valueCol[colIdx]); +for (int i = 0; i < keys.size(); i++) { +T1 key = keys.get(i); +T2 value = values.get(i); +hashMap.put(key, value); Review Comment: keep the same as doris map if contain the key, just skip the put call `putIfAbsent` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] caiconghui commented on pull request #22125: [feature](metric) Support collect query counter and error query counter metric in user level
caiconghui commented on PR #22125: URL: https://github.com/apache/doris/pull/22125#issuecomment-1646875835 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] HappenLee commented on a diff in pull request #22059: [vectorized](udf) java udf support map type
HappenLee commented on code in PR #22059: URL: https://github.com/apache/doris/pull/22059#discussion_r1271473459 ## fe/be-java-extensions/java-udf/src/main/java/org/apache/doris/udf/BaseExecutor.java: ## @@ -1202,4 +1202,238 @@ public Object[] convertArrayArg(int argIdx, boolean isNullable, int rowStart, in } return argument; } + +public Object[] convertMapKeyArg(int argIdx, boolean isNullable, int rowStart, int rowEnd, long nullMapAddr, Review Comment: `convertMapKeyArg` look same as `convertMapValueArg` maybe we only need keep one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22078: [improvement](iceberg) Optimize the split to the user-specified size
hello-stephen commented on PR #22078: URL: https://github.com/apache/doris/pull/22078#issuecomment-1646878148 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.12 seconds stream load tsv: 505 seconds loaded 74807831229 Bytes, about 141 MB/s stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.9 seconds inserted 1000 Rows, about 334K ops/s storage size: 17161863049 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21959: [nereids](function) add boolean type for sum agg
github-actions[bot] commented on PR #21959: URL: https://github.com/apache/doris/pull/21959#issuecomment-1646878198 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21959: [nereids](function) add boolean type for sum agg
github-actions[bot] commented on PR #21959: URL: https://github.com/apache/doris/pull/21959#issuecomment-1646878207 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] jackwener commented on pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
jackwener commented on PR #22000: URL: https://github.com/apache/doris/pull/22000#issuecomment-1646879720 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] HappenLee commented on a diff in pull request #21940: [Bug](pipeline) fix pipeline shared scan + topn optimization
HappenLee commented on code in PR #21940: URL: https://github.com/apache/doris/pull/21940#discussion_r1271476471 ## be/src/olap/rowset/beta_rowset_reader.cpp: ## @@ -75,9 +75,9 @@ bool BetaRowsetReader::update_profile(RuntimeProfile* profile) { } Status BetaRowsetReader::get_segment_iterators(RowsetReaderContext* read_context, + size_t scanner_idx, Review Comment: scanner_idx seems useless in this function? why pass the arg? ## be/src/olap/rowset/beta_rowset_reader.cpp: ## @@ -75,9 +75,9 @@ bool BetaRowsetReader::update_profile(RuntimeProfile* profile) { } Status BetaRowsetReader::get_segment_iterators(RowsetReaderContext* read_context, + size_t scanner_idx, Review Comment: scanner_idx seems useless in this function? why pass the arg? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #21968: [improvement](catalog)optimize ldap
hello-stephen commented on PR #21968: URL: https://github.com/apache/doris/pull/21968#issuecomment-1646883960 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.24 seconds stream load tsv: 509 seconds loaded 74807831229 Bytes, about 140 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 37.3 seconds inserted 1000 Rows, about 268K ops/s storage size: 17167770310 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
hello-stephen commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646884151 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.84 seconds stream load tsv: 525 seconds loaded 74807831229 Bytes, about 135 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.1 seconds inserted 1000 Rows, about 343K ops/s storage size: 17169679572 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22125: [feature](metric) Support collect query counter and error query counter metric in user level
hello-stephen commented on PR #22125: URL: https://github.com/apache/doris/pull/22125#issuecomment-1646884499 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.53 seconds stream load tsv: 507 seconds loaded 74807831229 Bytes, about 140 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s insert into select: 29.2 seconds inserted 1000 Rows, about 342K ops/s storage size: 17166959829 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
hello-stephen commented on PR #22000: URL: https://github.com/apache/doris/pull/22000#issuecomment-1646889764 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.23 seconds stream load tsv: 504 seconds loaded 74807831229 Bytes, about 141 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s insert into select: 29.1 seconds inserted 1000 Rows, about 343K ops/s storage size: 17167067837 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] zclllyybb commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
zclllyybb commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646911289 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
github-actions[bot] commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646915614 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] zclllyybb commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
zclllyybb commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646920819 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
github-actions[bot] commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646922756 clang-tidy review says "All clean, LGTM! :+1:" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #21898: [feature](datetime) Support timezone when insert datetime value
hello-stephen commented on PR #21898: URL: https://github.com/apache/doris/pull/21898#issuecomment-1646931249 (From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 45.2 seconds stream load tsv: 524 seconds loaded 74807831229 Bytes, about 136 MB/s stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s insert into select: 29.4 seconds inserted 1000 Rows, about 340K ops/s storage size: 17167516479 Bytes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22125: [feature](metric) Support collect query counter and error query counter metric in user level
github-actions[bot] commented on PR #22125: URL: https://github.com/apache/doris/pull/22125#issuecomment-1647028942 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22125: [feature](metric) Support collect query counter and error query counter metric in user level
github-actions[bot] commented on PR #22125: URL: https://github.com/apache/doris/pull/22125#issuecomment-1647028962 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22070: [enhancement](nereids) Update stats table config
github-actions[bot] commented on PR #22070: URL: https://github.com/apache/doris/pull/22070#issuecomment-1647029435 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22070: [enhancement](nereids) Update stats table config
github-actions[bot] commented on PR #22070: URL: https://github.com/apache/doris/pull/22070#issuecomment-1647029462 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] yiguolei merged pull request #22082: fix(compaction) release the block and segment iterator after reading to the end of the segment file
yiguolei merged PR #22082: URL: https://github.com/apache/doris/pull/22082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch master updated: fix(compaction) release the block and segment iterator after reading to the end of the segment file (#22082)
This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/master by this push: new 0396ac9d38 fix(compaction) release the block and segment iterator after reading to the end of the segment file (#22082) 0396ac9d38 is described below commit 0396ac9d38ca2c2501f44aea7ecb1b5f2073b440 Author: Chenyang Sun AuthorDate: Mon Jul 24 08:47:19 2023 +0800 fix(compaction) release the block and segment iterator after reading to the end of the segment file (#22082) When reading to the end of the segment file, clearing the block did not release the memory, leading to high memory usage during compaction. When reading through segment file for columns that are dictionary encoded, the column iterator in the segment iterator will hold the dictionary. Release the segment iterator to free up the dictionary. --- be/src/vec/olap/vertical_merge_iterator.cpp | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/be/src/vec/olap/vertical_merge_iterator.cpp b/be/src/vec/olap/vertical_merge_iterator.cpp index 7285531d71..7ace9b457c 100644 --- a/be/src/vec/olap/vertical_merge_iterator.cpp +++ b/be/src/vec/olap/vertical_merge_iterator.cpp @@ -356,6 +356,13 @@ Status VerticalMergeIteratorContext::_load_next_block() { if (!st.ok()) { _valid = false; if (st.is()) { +// When reading to the end of the segment file, clearing the block did not release the memory. +// Directly releasing the block to free up memory. +_block.reset(); +// When reading through segment file for columns that are dictionary encoded, +// the column iterator in the segment iterator will hold the dictionary. +// Release the segment iterator to free up the dictionary. +_iter.reset(); return Status::OK(); } else { return st; @@ -601,7 +608,9 @@ Status VerticalFifoMergeIterator::init(const StorageReadOptions& opts) { Status VerticalMaskMergeIterator::check_all_iter_finished() { for (auto iter : _origin_iter_ctx) { if (iter->inited()) { -RETURN_IF_ERROR(iter->advance()); +if (iter->valid()) { +RETURN_IF_ERROR(iter->advance()); +} DCHECK(!iter->valid()); } } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] yiguolei merged pull request #22078: [improvement](iceberg) Optimize the split to the user-specified size
yiguolei merged PR #22078: URL: https://github.com/apache/doris/pull/22078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch master updated: [improvement](iceberg) Optimize the split to the user-specified size #22078
This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/master by this push: new 64348055a1 [improvement](iceberg) Optimize the split to the user-specified size #22078 64348055a1 is described below commit 64348055a1b449b018018123b4828c5974591838 Author: wuwenchi AuthorDate: Mon Jul 24 08:48:10 2023 +0800 [improvement](iceberg) Optimize the split to the user-specified size #22078 According to the specified split size, the split tasks are merged to keep a single task near the expected size. --- .../planner/external/iceberg/IcebergScanNode.java | 24 -- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/external/iceberg/IcebergScanNode.java b/fe/fe-core/src/main/java/org/apache/doris/planner/external/iceberg/IcebergScanNode.java index 2de2f8291c..3d3634fb66 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/planner/external/iceberg/IcebergScanNode.java +++ b/fe/fe-core/src/main/java/org/apache/doris/planner/external/iceberg/IcebergScanNode.java @@ -47,6 +47,7 @@ import org.apache.doris.thrift.TTableFormatFileDesc; import avro.shaded.com.google.common.base.Preconditions; import org.apache.hadoop.fs.Path; import org.apache.iceberg.BaseTable; +import org.apache.iceberg.CombinedScanTask; import org.apache.iceberg.DeleteFile; import org.apache.iceberg.FileContent; import org.apache.iceberg.FileScanTask; @@ -57,8 +58,11 @@ import org.apache.iceberg.Table; import org.apache.iceberg.TableScan; import org.apache.iceberg.exceptions.NotFoundException; import org.apache.iceberg.expressions.Expression; +import org.apache.iceberg.io.CloseableIterable; import org.apache.iceberg.types.Conversions; +import org.apache.iceberg.util.TableScanUtil; +import java.io.IOException; import java.nio.ByteBuffer; import java.time.Instant; import java.util.ArrayList; @@ -179,21 +183,29 @@ public class IcebergScanNode extends FileQueryScanNode { int formatVersion = ((BaseTable) table).operations().current().formatVersion(); // Min split size is DEFAULT_SPLIT_SIZE(128MB). long splitSize = Math.max(ConnectContext.get().getSessionVariable().getFileSplitSize(), DEFAULT_SPLIT_SIZE); -for (FileScanTask task : scan.planFiles()) { -long fileSize = task.file().fileSizeInBytes(); -for (FileScanTask splitTask : task.split(splitSize)) { +CloseableIterable fileScanTasks = TableScanUtil.splitFiles(scan.planFiles(), splitSize); +try (CloseableIterable combinedScanTasks = + TableScanUtil.planTasks(fileScanTasks, splitSize, 1, 0)) { +combinedScanTasks.forEach(taskGrp -> taskGrp.files().forEach(splitTask -> { String dataFilePath = splitTask.file().path().toString(); Path finalDataFilePath = S3Util.toScanRangeLocation(dataFilePath); -IcebergSplit split = new IcebergSplit(finalDataFilePath, splitTask.start(), -splitTask.length(), fileSize, new String[0]); +IcebergSplit split = new IcebergSplit( +finalDataFilePath, +splitTask.start(), +splitTask.length(), +splitTask.file().fileSizeInBytes(), +new String[0]); split.setFormatVersion(formatVersion); if (formatVersion >= MIN_DELETE_FILE_SUPPORT_VERSION) { split.setDeleteFileFilters(getDeleteFileFilters(splitTask)); } split.setTableFormatType(TableFormatType.ICEBERG); splits.add(split); -} +})); +} catch (IOException e) { +throw new UserException(e.getMessage(), e.getCause()); } + return splits; } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] yiguolei merged pull request #22056: [refactor](be) use std::move to improve performance of push_back
yiguolei merged PR #22056: URL: https://github.com/apache/doris/pull/22056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[doris] branch master updated: [refactor](be) use std::move to improve performance of push_back #22056
This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git The following commit(s) were added to refs/heads/master by this push: new d0219062ef [refactor](be) use std::move to improve performance of push_back #22056 d0219062ef is described below commit d0219062ef047c55ca405da25d8f319fcbdf077f Author: 赵立伟 AuthorDate: Mon Jul 24 08:51:28 2023 +0800 [refactor](be) use std::move to improve performance of push_back #22056 --- be/src/exec/olap_common.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/be/src/exec/olap_common.h b/be/src/exec/olap_common.h index 2260dac25c..7a58645b74 100644 --- a/be/src/exec/olap_common.h +++ b/be/src/exec/olap_common.h @@ -207,7 +207,7 @@ public: } if (null_pred.condition_values.size() != 0) { -filters.push_back(null_pred); +filters.push_back(std::move(null_pred)); return; } @@ -221,7 +221,7 @@ public: } if (low.condition_values.size() != 0) { -filters.push_back(low); +filters.push_back(std::move(low)); } TCondition high; @@ -234,7 +234,7 @@ public: } if (high.condition_values.size() != 0) { -filters.push_back(high); +filters.push_back(std::move(high)); } } else { // 3. convert to is null and is not null filter condition @@ -247,7 +247,7 @@ public: } if (null_pred.condition_values.size() != 0) { -filters.push_back(null_pred); +filters.push_back(std::move(null_pred)); } } } @@ -264,7 +264,7 @@ public: } if (condition.condition_values.size() != 0) { -filters.push_back(condition); +filters.push_back(std::move(condition)); } } @@ -288,7 +288,7 @@ public: condition.condition_values.push_back( cast_to_string(value.second, _scale)); if (condition.condition_values.size() != 0) { -filters.push_back(condition); +filters.push_back(std::move(condition)); } } } @@ -318,7 +318,7 @@ public: condition.condition_values.push_back( cast_to_string(value.second, _scale)); if (condition.condition_values.size() != 0) { -filters.push_back(condition); +filters.push_back(std::move(condition)); } } } - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] yiguolei commented on pull request #22027: [fix](memory) fix invalid large memory check && fix memory info thread safety
yiguolei commented on PR #22027: URL: https://github.com/apache/doris/pull/22027#issuecomment-1647038557 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
github-actions[bot] commented on PR #22000: URL: https://github.com/apache/doris/pull/22000#issuecomment-1647040596 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #22000: [refactor](Nereids): avoid useless groupByColStats Map
github-actions[bot] commented on PR #22000: URL: https://github.com/apache/doris/pull/22000#issuecomment-1647040575 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org