boluor opened a new pull request, #3725:
URL: https://github.com/apache/doris-website/pull/3725

   ## Summary
   
   A pass of doc fixes where the documentation disagreed with what the engine 
(or the companion tools `doris-streamloader` and `doris-flink-connector`) 
actually does. Each one was verified against the source.
   
   ### `doris-streamloader.md` — `workers` default
   
   Best Practices said the default was "the number of CPU cores". 
`apache/doris-streamloader/main.go` defines `flag.IntVar(&workers, "workers", 
0, ...)` — the default is `0`, which means **automatic mode** (the tool 
computes a value from import size, `disk_throughput`, and 
`streamload_throughput`, typically resolving to 1, 2, 4, or 8). CPU cores are 
not consulted anywhere in `calculateAndCheckWorkers`.
   
   ### `flink-doris-connector.md` — `source.use-flight-sql` default
   
   The parameter table said default = `FALSE`, contradicting the prose nearby 
("Starting from Doris 2.1, ADBC is the default read protocol"). 
`apache/doris-flink-connector` `ConfigurationOptions.java` defines 
`USE_FLIGHT_SQL_DEFAULT = true` since the 25.1.0 connector (PR #574, commit 
`e691bf89`, 2025-03-13).
   
   Updated the parameter table to `TRUE` in `docs/` and `version-4.x/` (which 
are paired with the current 25.x connector). `version-2.1/` and `version-3.x/` 
were left untouched — the connector versions paired with those Doris releases 
did default to `FALSE`, and the prose claim there is a separate issue that 
warrants a wider rewrite.
   
   ### `json.md` — `read_json_by_line` default
   
   The detailed description further down the page said "Default: false", 
contradicting the matrix and tip at the top of the same page. The actual 
default in `JsonFileFormatProperties.java:62-69` is **`true` if neither 
`read_json_by_line` nor `strip_outer_array` is supplied**; setting 
`strip_outer_array=true` flips it to `false`. Broker Load and Routine Load 
always force `true`.
   
   ### `JSON.md` — `json_type` returns `"int"`, not `"TINYINT"`
   
   The prose claimed the second `123` was of type `TINYINT`, but the sample 
output immediately above shows the result `int`. `json_type` (see 
`be/src/util/jsonb_document.h` `typeName()` around lines 647-680) returns the 
string `"int"` for `T_Int8`, `T_Int16`, and `T_Int32` — there is no `"tinyint"` 
/ `"smallint"` value. Reworded the prose to match what users actually see.
   
   ### `to-base64-binary.md` — edge-case wording
   
   Said "If input is an empty string, returns an empty string". The function 
takes `VARBINARY`, not string. Reworded to "If input is an empty VARBINARY 
(zero bytes), returns an empty string".
   
   ### `date-floor.md` — QUARTER missing from the type list
   
   `date_ceil` lists `QUARTER` in its type list but `date_floor` did not. The 
engine supports `date_floor(x, INTERVAL n QUARTER)` — see 
`fe/fe-core/.../scalar/QuarterFloor.java` and `BuiltinScalarFunctions.java:987 
scalar(QuarterFloor.class, "quarter_floor")`. `version-3.x/` already had a tip 
stating "QUARTER is supported since 3.0.8 and 3.1.0", but the type list still 
didn't mention it. Added `QUARTER` to the type list in `docs/`, `version-4.x/`, 
and `version-3.x/`. `version-2.1/` left untouched (engine in 2.1 didn't support 
`QUARTER` for floor).
   
   ### `minutes-sub` / `minutes-add` / `months-sub` / `months-add` — singular 
vs plural in error tag
   
   Error messages were transcribed as `Operation minutes_add of …` / `Operation 
months_add of …`, but the engine's `get_time_unit_name` in 
`be/src/exprs/function/datetime_errors.h` returns the singular tags 
`minute_add` / `month_add`:
   
   ```c++
   case TimeUnit::MINUTE: return "minute_add";
   case TimeUnit::MONTH:  return "month_add";
   ```
   
   So the actual error a user sees is `Operation minute_add of …`. The 
throw-on-overflow refactor that introduced these messages landed in Sept 2025, 
so this only applies to `docs/` / `version-4.x/` and their zh counterparts (in 
2.1/3.x these calls return NULL with no error message).
   
   ### `previous-day.md` — frontmatter `description` + TIMESTAMPTZ claim
   
   - Frontmatter was missing a `description` field.
   - The body said the function supports `DATE`, `DATETIME`, and `TIMESTAMPTZ`, 
but the parameter table only lists `DATE` and `DATETIME`, and the FE signature 
(`PreviousDay.java:40-41`) is `DATE_V2` only:
     ```java
     private static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
         FunctionSignature.ret(DateV2Type.INSTANCE).args(DateV2Type.INSTANCE, 
StringType.INSTANCE));
     ```
     Aligned the body wording to the parameter table (dropped TIMESTAMPTZ).
   
   ## Scope
   
   Changes applied to `docs/`, `versioned_docs/version-{2.1,3.x,4.x}/` where 
the same content exists, and to `i18n/zh-CN/` for the cases where the English 
string is reused verbatim (param-table values, error messages, the QUARTER type 
list) or where the zh prose exists as a direct translation (the `workers` 
default sentence). For `read_json_by_line`, `JSON_TYPE`, `to-base64-binary`, 
and `previous-day`, the zh content is structured differently and was left for a 
separately translated edit.
   
   Findings `#239` (`from_hex`) and `#240` (`to_hex`) were on the audit list 
but the source confirms the existing docs are already correct — those are not 
in this PR.
   
   ## Test plan
   
   - [x] For each finding, verify the relevant Doris source (config constant, 
FE signature, BE implementation, error-message helper) and quote the file:line 
in the commit body.
   - [x] Spot-check each rendered diff.
   - [ ] CI build (docusaurus + sidebar checks).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to