morningman opened a new pull request, #64143:
URL: https://github.com/apache/doris/pull/64143

   ## Proposed changes
   
   P3 of the catalog-SPI migration (base: `branch-catalog-spi`). Migrates the 
**hudi** connector following the **hybrid** strategy (D-019): harden the 
dormant HMS-over-SPI hudi connector to correctness parity, build a test 
baseline, and write the per-table dispatch design — **all behind the closed 
gate** (`SPI_READY_TYPES` unchanged).
   
   > ⚠️ **No user-visible behavior change.** The SPI hudi path stays dormant 
(gate closed); hudi queries continue to use the legacy 
`HMSExternalTable.dlaType=HUDI` path. This PR removes correctness blockers 
ahead of the live cutover (deferred to P7 / batch E).
   
   ### What's included
   
   **Correctness fixes (hardening dormant code, behind gate):**
   - **T02** — fix hudi JNI `column_types` double bug: emit full Hive type 
strings (was Doris bare type names, losing precision/scale/subtypes) and send 
`column_names`/`column_types`/`delta_logs` as typed lists end-to-end (was comma 
join/split, which shattered `decimal(10,2)` / `struct<...>`). Matches the BE 
`hudi_jni_reader.cpp` contract (names `,` / types `#` / delta `,`).
   - **T04** — fail loud on time-travel / incremental read in the SPI 
`visitPhysicalHudiScan` branch (was silently returning the latest snapshot / 
silently full-scanning).
   - **T05** — real EQ/IN partition pruning in 
`HudiConnectorMetadata.applyFilter` (was a placeholder that ignored predicates 
and unconditionally switched the partition source from Hudi-metadata to HMS); 
faithfully mirrors `HiveConnectorMetadata.applyFilter`.
   - **T07** — column-name casing fix in `avroSchemaToColumns` (top-level 
lowercase, mirroring legacy `HMSExternalTable`).
   
   **Test baseline (all three connector modules started P3 with 0 tests):**
   - `fe-connector-hudi` (33): type-mapping / schema-parity (COW/MOR golden) / 
table-type / partition-pruning / scan-range.
   - `fe-connector-hms` (12): shared Hive-type-string parser tests.
   - `fe-connector-hive` (14): file-format / partition-pruning (mirrors T05).
   - COW/MOR schema is **type-agnostic** (golden parity vs legacy 
`initHudiSchema`); table type only affects scan planning.
   
   **Decisions / design (code-grounded, design-only):**
   - **T03** — defer `schema_id`/`history_schema_info` field-id evolution to 
batch E (DV-006; not a model-agnostic SPI fix).
   - **T06** — keep MVCC/snapshot SPI defaults (opt-out) + document (DV-007).
   - **T08** — `tableFormatType` dispatch design memo + **D-020**: single `hms` 
catalog per-table routing via a new backward-compatible 
`ConnectorMetadata.getScanPlanProvider(handle)` (per-table provider seam); 
refines D-005. The keystone gap is split into M1 (identity consumption, fe-core 
reads `tableFormatType` as an opaque string) and M2 (scan routing).
   
   ### Deferred to batch E / P7 (not in this PR)
   Gate flip (`SPI_READY_TYPES += hms/hudi`), fe-core `tableFormatType` 
consumption (M1+M2 implementation), live cutover, delete legacy 
`datasource/hudi/`, full incremental/time-travel/MVCC, Iceberg-on-hms via SPI 
(needs P6 `IcebergScanPlanProvider`), cluster/runtime validation.
   
   ### Verification
   Per task tracking, each code batch landed with: per-module compile + 
checkstyle 0 (incl. test sources) + connector import-gate pass + new unit tests 
green. The two most recent commits are docs-only (`plan-doc/`); the code is 
unchanged since the last green batch. Gate stays closed → the dormant SPI path 
is unreachable at runtime → zero live-path risk. CI re-verifies.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to