ryankert01 opened a new pull request, #3215:
URL: https://github.com/apache/iggy/pull/3215

   ## Summary
   
   First-pass Apache Doris sink connector for #2753. Writes Iggy messages to 
Doris via the HTTP Stream Load API.
   
   **v1 scope (intentional):** JSON payloads only, HTTP Basic auth only, 
pre-created tables only (no DDL). Retry middleware, circuit breaker, and 
additional auth methods are deferred to a follow-up.
   
   ## Behaviour
   
   - **Manual 307/308 redirect following** (capped at 5) so the `Authorization` 
header survives the FE → BE hop. `reqwest`'s default policy strips 
`Authorization` on cross-host redirects, which would otherwise break Stream 
Load entirely.
   - **Deterministic per-batch label** — 
`{prefix}-{stream}-{topic}-{partition}-{first_offset}-{last_offset}`. Replays 
of the same batch are deduplicated by Doris within `label_keep_max_second`, so 
restart-replay is safe regardless of how upstream sink-offset semantics are 
resolved.
   - **Status-body classification** — HTTP 200 alone is not sufficient; Doris 
reports outcome in the response body's `Status` field. `Success` and `Label 
Already Exists` → `Ok`; `Publish Timeout` → `CannotStoreData` (transient); 
`Fail` and any unknown status → `PermanentHttpError` so the runtime DLQs the 
batch instead of looping.
   - **Optional Stream Load knobs** — `columns`, `where`, `max_filter_ratio`, 
`batch_size`, `timeout_secs` forwarded as headers.
   - **Secret hygiene** — `password` is `secrecy::SecretString`; the cached 
`Authorization` header is also wrapped in `SecretString` so `Debug` derivations 
never leak the base64 credential.
   - **Startup validation** — `reqwest::Client` and `fe_url` are built/parsed 
in `open()` and surface as `Error::InitError` / `Error::InvalidConfigValue`. 
`new()` is infallible to keep the FFI boundary panic-free.
   
   ## Test plan
   
   6 integration tests under `core/integration/tests/connectors/doris/` backed 
by an `apache/doris` all-in-one testcontainer (FE HTTP 8030 + FE MySQL 9030):
   
   - [x] `given_existent_doris_table_should_store` — happy path, ~10 rows.
   - [x] `given_bulk_message_send_should_store` — 1000 rows.
   - [x] `given_invalid_messages_should_skip_via_max_filter_ratio` — 
`max_filter_ratio = 0.5`, mix of valid/invalid JSON, only valid rows land.
   - [x] `given_replayed_label_should_dedupe` — same offsets ⇒ same label ⇒ 
Doris must respond `Label Already Exists`; row count must not increase.
   - [x] `given_missing_target_table_should_not_create_or_corrupt` — sink to a 
non-existent table; assert no auto-create and no rogue tables in the test DB.
   - [x] `given_columns_config_should_apply_derived_expression` — pre-create a 
table with an extra `calculated INT NOT NULL` column not present in the JSON 
payload, configure `columns = "..., calculated = count + 1"`, verify the 
derived value lands.
   
   Plus 14 unit tests covering URL construction, label sanitization, status 
classification, and response parsing.
   
   ## Quality checks
   
   - `cargo build -p iggy_connector_doris_sink` ✅
   - `cargo test -p iggy_connector_doris_sink --lib` ✅ (14/14)
   - `cargo clippy -p iggy_connector_doris_sink -p integration --all-targets 
--all-features -- -D warnings` ✅
   - `cargo fmt --all -- --check` ✅
   - All 6 integration tests pass on Docker for Mac (Apple Silicon).
   
   ## Notes for reviewers
   
   - **LOC:** crate + integration tests + fixture come to ~1.9k lines. 
`CONTRIBUTING.md` flags 500 LOC as the cap; bundling the crate with its tests 
is intentional because the Stream Load specifics (307 redirect with auth 
re-attachment, response-body `Status` parsing, deterministic-label idempotency) 
aren't meaningfully reviewable without the integration coverage. Happy to split 
the test files into a follow-up if maintainers prefer; the natural fault line 
is `core/integration/tests/connectors/doris/` + 
`core/integration/tests/connectors/fixtures/doris/`.
   - The `pre-commit` `licenses-list` hook regenerated `DEPENDENCIES.md` to add 
the new crate row; that's the only change in that file.
   
   Closes #2753 *(partial — sink half of the issue; source connector remains 
open)*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to