github-actions[bot] commented on code in PR #61242:
URL: https://github.com/apache/doris/pull/61242#discussion_r2922826937
##########
be/src/service/http/action/stream_load.cpp:
##########
@@ -366,6 +370,9 @@ Status StreamLoadAction::_on_header(HttpRequest* http_req,
std::shared_ptr<Strea
int64_t begin_txn_start_time = MonotonicNanos();
RETURN_IF_ERROR(_exec_env->stream_load_executor()->begin_txn(ctx.get()));
ctx->begin_txn_cost_nanos = MonotonicNanos() - begin_txn_start_time;
+ if (ctx->group_commit) {
+ RETURN_IF_ERROR(_check_wal_space(ctx->group_commit_mode,
ctx->body_bytes));
+ }
Review Comment:
**[Low] `_check_wal_space` called with `body_bytes=0`**
At this point in `_on_header`, `ctx->body_bytes` is initialized to 0 (set at
line 304). The body hasn't been received yet. So
`_check_wal_space(ctx->group_commit_mode, ctx->body_bytes)` is called with
`content_length=0`, which always passes the
`load_size_smaller_than_wal_limit(0)` check.
This means the WAL space check is effectively a no-op for the table-property
group commit path. The explicit header path has a proper check in
`_can_group_commit` (line 943) using the Content-Length header value.
Consider using the Content-Length from the HTTP header instead:
```cpp
if (ctx->group_commit) {
int64_t content_length =
http_req->header(HttpHeaders::CONTENT_LENGTH).empty()
? 0 :
std::stoll(http_req->header(HttpHeaders::CONTENT_LENGTH));
RETURN_IF_ERROR(_check_wal_space(ctx->group_commit_mode,
content_length));
}
```
##########
be/src/service/http/action/stream_load.cpp:
##########
@@ -933,17 +940,43 @@ Status
StreamLoadAction::_handle_group_commit(HttpRequest* req,
if (!config::wait_internal_group_commit_finish && !ctx->label.empty())
{
return Status::InvalidArgument("label and group_commit can't be
set at the same time");
}
Review Comment:
**[High] Label check incorrectly rejects labeled stream loads on tables with
`group_commit_mode` property**
When the user does NOT set the `group_commit` HTTP header (empty
`group_commit_header`), but the table has
`group_commit_mode=async_mode/sync_mode`, this code path is still reached via
`_handle_group_commit` → `_can_group_commit`. If the user provides a label
(which is normal for stream loads), this returns `InvalidArgument("label and
group_commit can't be set at the same time")`.
The problem: the user never explicitly requested group commit — they just
have a table property. This is a behavioral regression because existing users
who always set labels would suddenly get errors after someone sets the table
property.
Suggestion: Either skip this check when `group_commit_header` is empty
(table-property path), or move the label conflict check to after the FE
responds confirming the table actually uses group commit. For example:
```cpp
if (!group_commit_header.empty() &&
!config::wait_internal_group_commit_finish && !ctx->label.empty()) {
return Status::InvalidArgument("label and group_commit can't be set at
the same time");
}
```
This ensures only explicit header-based group commit rejects labels, while
the table-property path allows labels (and falls through to a normal
non-group-commit load when a label is present).
##########
be/src/load/stream_load/stream_load_executor.cpp:
##########
@@ -217,6 +220,15 @@ Status StreamLoadExecutor::begin_txn(StreamLoadContext*
ctx) {
}
return status;
}
+ if (ctx->group_commit_mode.empty()) {
+ auto table_group_commit_mode = result.table_group_commit_mode;
+ if (iequal(table_group_commit_mode, "async_mode") ||
Review Comment:
**[Low] Missing `__isset` check for `result.table_group_commit_mode`**
This reads `result.table_group_commit_mode` without first checking
`result.__isset.table_group_commit_mode`. While the default empty string would
cause the `iequal` checks to fail (so this works in practice), it's more
idiomatic and safer to check `__isset` first, consistent with how
`result.__isset.db_id` is checked at line 233:
```cpp
if (ctx->group_commit_mode.empty() &&
result.__isset.table_group_commit_mode) {
auto table_group_commit_mode = result.table_group_commit_mode;
```
##########
be/src/service/http/action/stream_load.cpp:
##########
@@ -933,17 +940,43 @@ Status
StreamLoadAction::_handle_group_commit(HttpRequest* req,
if (!config::wait_internal_group_commit_finish && !ctx->label.empty())
{
return Status::InvalidArgument("label and group_commit can't be
set at the same time");
}
- ctx->group_commit = true;
- if (iequal(group_commit_mode, "async_mode")) {
- if (!load_size_smaller_than_wal_limit(content_length)) {
- std::stringstream ss;
- ss << "There is no space for group commit stream load async
WAL. This stream load "
- "size is "
- << content_length << ". WAL dir info: "
- <<
ExecEnv::GetInstance()->wal_mgr()->get_wal_dirs_info_string();
- LOG(WARNING) << ss.str();
- return Status::Error<EXCEEDED_LIMIT>(ss.str());
- }
+ RETURN_IF_ERROR(_check_wal_space(group_commit_header, content_length));
+ can_group_commit = true;
+ }
+ return Status::OK();
+}
+
+Status StreamLoadAction::_handle_group_commit(HttpRequest* req,
+
std::shared_ptr<StreamLoadContext> ctx) {
+ std::string group_commit_header = req->header(HTTP_GROUP_COMMIT);
+ if (!group_commit_header.empty() && !iequal(group_commit_header,
SYNC_MODE) &&
+ !iequal(group_commit_header, ASYNC_MODE) &&
!iequal(group_commit_header, OFF_MODE)) {
+ return Status::InvalidArgument(
+ "group_commit can only be [async_mode, sync_mode, off_mode]");
+ }
+ if (config::wait_internal_group_commit_finish) {
+ group_commit_header = SYNC_MODE;
+ }
+
+ // if group_commit_header is off_mode, we will not use group commit
+ if (iequal(group_commit_header, OFF_MODE)) {
+ ctx->group_commit_mode = OFF_MODE;
+ ctx->group_commit = false;
+ return Status::OK();
+ }
+ bool can_group_commit = false;
+ RETURN_IF_ERROR(_can_group_commit(req, ctx, group_commit_header,
can_group_commit));
+ if (!can_group_commit) {
+ ctx->group_commit_mode = OFF_MODE;
+ ctx->group_commit = false;
+ } else {
+ if (!group_commit_header.empty()) {
+ ctx->group_commit_mode = group_commit_header;
+ ctx->group_commit = true;
+ } else {
+ // use table property to decide group commit or not
+ ctx->group_commit_mode = "";
Review Comment:
**[Medium] Parallel code path: `HttpStreamAction` not updated**
The `HttpStreamAction` (`http_stream.cpp`) has its own
`_handle_group_commit` method that does NOT support the table-property group
commit mode. It treats an empty `group_commit` header as "no group commit" and
returns immediately. Unlike `StreamLoadAction`, it doesn't call `begin_txn`
(the transaction is opened by FE via `NativeInsertStmt`), so the mechanism used
here (empty `group_commit_mode` → `begin_txn` queries FE for table property)
doesn't directly apply.
However, the feature description says _"For stream load, if it not set
group_commit header, use the table property as the group commit mode"_. This
should apply to both stream load endpoints. Consider either:
1. Adding table-property support to `HttpStreamAction` in this PR, or
2. Documenting this as a known limitation and filing a follow-up issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]