boluor opened a new pull request, #3725:
URL: https://github.com/apache/doris-website/pull/3725
## Summary
A pass of doc fixes where the documentation disagreed with what the engine
(or the companion tools `doris-streamloader` and `doris-flink-connector`)
actually does. Each one was verified against the source.
### `doris-streamloader.md` — `workers` default
Best Practices said the default was "the number of CPU cores".
`apache/doris-streamloader/main.go` defines `flag.IntVar(&workers, "workers",
0, ...)` — the default is `0`, which means **automatic mode** (the tool
computes a value from import size, `disk_throughput`, and
`streamload_throughput`, typically resolving to 1, 2, 4, or 8). CPU cores are
not consulted anywhere in `calculateAndCheckWorkers`.
### `flink-doris-connector.md` — `source.use-flight-sql` default
The parameter table said default = `FALSE`, contradicting the prose nearby
("Starting from Doris 2.1, ADBC is the default read protocol").
`apache/doris-flink-connector` `ConfigurationOptions.java` defines
`USE_FLIGHT_SQL_DEFAULT = true` since the 25.1.0 connector (PR #574, commit
`e691bf89`, 2025-03-13).
Updated the parameter table to `TRUE` in `docs/` and `version-4.x/` (which
are paired with the current 25.x connector). `version-2.1/` and `version-3.x/`
were left untouched — the connector versions paired with those Doris releases
did default to `FALSE`, and the prose claim there is a separate issue that
warrants a wider rewrite.
### `json.md` — `read_json_by_line` default
The detailed description further down the page said "Default: false",
contradicting the matrix and tip at the top of the same page. The actual
default in `JsonFileFormatProperties.java:62-69` is **`true` if neither
`read_json_by_line` nor `strip_outer_array` is supplied**; setting
`strip_outer_array=true` flips it to `false`. Broker Load and Routine Load
always force `true`.
### `JSON.md` — `json_type` returns `"int"`, not `"TINYINT"`
The prose claimed the second `123` was of type `TINYINT`, but the sample
output immediately above shows the result `int`. `json_type` (see
`be/src/util/jsonb_document.h` `typeName()` around lines 647-680) returns the
string `"int"` for `T_Int8`, `T_Int16`, and `T_Int32` — there is no `"tinyint"`
/ `"smallint"` value. Reworded the prose to match what users actually see.
### `to-base64-binary.md` — edge-case wording
Said "If input is an empty string, returns an empty string". The function
takes `VARBINARY`, not string. Reworded to "If input is an empty VARBINARY
(zero bytes), returns an empty string".
### `date-floor.md` — QUARTER missing from the type list
`date_ceil` lists `QUARTER` in its type list but `date_floor` did not. The
engine supports `date_floor(x, INTERVAL n QUARTER)` — see
`fe/fe-core/.../scalar/QuarterFloor.java` and `BuiltinScalarFunctions.java:987
scalar(QuarterFloor.class, "quarter_floor")`. `version-3.x/` already had a tip
stating "QUARTER is supported since 3.0.8 and 3.1.0", but the type list still
didn't mention it. Added `QUARTER` to the type list in `docs/`, `version-4.x/`,
and `version-3.x/`. `version-2.1/` left untouched (engine in 2.1 didn't support
`QUARTER` for floor).
### `minutes-sub` / `minutes-add` / `months-sub` / `months-add` — singular
vs plural in error tag
Error messages were transcribed as `Operation minutes_add of …` / `Operation
months_add of …`, but the engine's `get_time_unit_name` in
`be/src/exprs/function/datetime_errors.h` returns the singular tags
`minute_add` / `month_add`:
```c++
case TimeUnit::MINUTE: return "minute_add";
case TimeUnit::MONTH: return "month_add";
```
So the actual error a user sees is `Operation minute_add of …`. The
throw-on-overflow refactor that introduced these messages landed in Sept 2025,
so this only applies to `docs/` / `version-4.x/` and their zh counterparts (in
2.1/3.x these calls return NULL with no error message).
### `previous-day.md` — frontmatter `description` + TIMESTAMPTZ claim
- Frontmatter was missing a `description` field.
- The body said the function supports `DATE`, `DATETIME`, and `TIMESTAMPTZ`,
but the parameter table only lists `DATE` and `DATETIME`, and the FE signature
(`PreviousDay.java:40-41`) is `DATE_V2` only:
```java
private static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(DateV2Type.INSTANCE).args(DateV2Type.INSTANCE,
StringType.INSTANCE));
```
Aligned the body wording to the parameter table (dropped TIMESTAMPTZ).
## Scope
Changes applied to `docs/`, `versioned_docs/version-{2.1,3.x,4.x}/` where
the same content exists, and to `i18n/zh-CN/` for the cases where the English
string is reused verbatim (param-table values, error messages, the QUARTER type
list) or where the zh prose exists as a direct translation (the `workers`
default sentence). For `read_json_by_line`, `JSON_TYPE`, `to-base64-binary`,
and `previous-day`, the zh content is structured differently and was left for a
separately translated edit.
Findings `#239` (`from_hex`) and `#240` (`to_hex`) were on the audit list
but the source confirms the existing docs are already correct — those are not
in this PR.
## Test plan
- [x] For each finding, verify the relevant Doris source (config constant,
FE signature, BE implementation, error-message helper) and quote the file:line
in the commit body.
- [x] Spot-check each rendered diff.
- [ ] CI build (docusaurus + sidebar checks).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]