924060929 commented on PR #65135:
URL: https://github.com/apache/doris/pull/65135#issuecomment-4875160197

   Thanks for the work — the native `$position_deletes` path is a nice 
improvement over the JNI-only approach. I built the branch and ran it against a 
live Iceberg REST catalog (Spark 4.0 / iceberg-rest 1.10 / MinIO); the happy 
path (single-spec partitioned/unpartitioned, v2 & v3) works correctly. A few 
issues worth addressing, ordered by severity.
   
   ### 1. (High) Kerberos / `hadoop.username` auth is silently dropped for all 
Iceberg JNI sys-table reads
   
   The refactor in `IcebergSysTableJniScanner` now passes the **raw** JNI 
params map to `PreExecutionAuthenticatorCache.getAuthenticator(...)`, dropping 
the old `HADOOP_OPTION_PREFIX` filter+strip. But BE 
`iceberg_sys_table_jni_reader.cpp` (unchanged) prefixes **every** catalog 
property with `hadoop.`:
   
   ```cpp
   params[HADOOP_OPTION_PREFIX + kv.first] = kv.second;  // 
hadoop.security.authentication -> hadoop.hadoop.security.authentication
   ```
   
   So `AuthenticationConfig` looks up the unprefixed keys 
(`hadoop.security.authentication`, `hadoop.kerberos.principal/keytab`, 
`hadoop.username`) and finds only the double-prefixed `hadoop.hadoop.*`, 
silently falling back to `SimpleAuthenticationConfig(user="hadoop")`. 
`PaimonJniScanner` still strips the prefix, i.e. this is the prior convention.
   
   Impact: on a kerberized catalog, `select * from tbl$snapshots` (or any JNI 
sys table) would read manifests without Kerberos; simple-auth catalogs with a 
custom `hadoop.username` read as `hadoop`; the authenticator cache key also 
collapses to a constant.
   
   Verified by running the real classes:
   - 
`AuthenticationConfig.getKerberosConfig({hadoop.hadoop.security.authentication=kerberos,
 ...})` → `SimpleAuthenticationConfig`
   - 
`AuthenticationConfig.getKerberosConfig({hadoop.security.authentication=kerberos,
 ...})` (prefix stripped) → `KerberosAuthenticationConfig`
   
   (Confirmed at the auth-config selection level; I didn't stand up a full 
kerberized cluster.) Suggest restoring the `hadoop.`-prefix filter/strip before 
`getAuthenticator`.
   
   ### 2. (Medium) AVRO position-delete files throw an uncaught 
`UnsupportedOperationException`
   
   `getNativePositionDeleteFileFormat` throws `UnsupportedOperationException` 
for `FileFormat.AVRO`, and `doGetPositionDeletesSystemTableSplits` only catches 
`IOException`, so it escapes split planning as an internal error. Avro position 
deletes are legal per the Iceberg spec (`write.delete.format.default=avro`).
   
   Reproduced on the built branch:
   ```
   select * from tbl$position_deletes;
   ERROR 1105 (HY000): errCode = 2, detailMessage = 
UnsupportedOperationException: Unsupported Iceberg position delete file format: 
AVRO
   ```
   (same via `iceberg_meta(...)`). Suggest a clean `UserException`/unsupported 
message, or routing avro through a supported reader.
   
   ### 3. (Medium) `partition` column is silently `NULL` after partition 
evolution or for NULL partition values
   
   `getPartitionDataObjectJson` builds the JSON from the delete file's **own** 
spec fields, and `GsonUtils.GSON` (no `serializeNulls`) drops null values — so 
the JSON can carry fewer fields than the sys table's `partition` column, which 
is the **union** partition type across all specs 
(`Partitioning.partitionType`). The BE struct serde then yields a whole-`NULL` 
partition (no error).
   
   Reproduced on the built branch, all `struct<a,b>`:
   
   | delete file partition | FE JSON | Doris `partition` |
   |---|---|---|
   | (a=p1, b=q1) | `{"a":"p1","b":"q1"}` | `{"a":"p1", "b":"q1"}` ✅ |
   | (a=p1, b=NULL) | `{"a":"p1"}` (b dropped) | `NULL` ❌ |
   | spec0 file after `ALTER TABLE ADD PARTITION FIELD id` | `{"dt":...}` | 
`NULL` ❌ |
   
   The query succeeds but the partition column silently loses data — any table 
that has undergone partition evolution shows `NULL` partitions for delete files 
written under older specs. This isn't covered by the "Preserve typed partition 
values" commit (that fixed value typing, not the struct shape). Suggest 
emitting the partition in the table's unified partition type (coerce 
`PartitionData`, as Iceberg's own metadata scan does) and enabling null 
serialization so the field count always matches the schema.
   
   ### 4. (Latent crash, low reachability) optional `row` column + schema 
evolution
   
   When a position-delete file actually contains the optional `row` struct and 
the table schema later evolves, the standalone `ParquetReader` init (no 
`column_descs` → `on_before_init_reader` skipped → `ConstNode` 
`table_info_node` whose `children_column_exists()` is always `true`) can 
default-insert and dereference a null child reader in 
`StructColumnReader::read_column_data` → SIGSEGV. Mainstream Spark MoR deletes 
don't populate `row` (this PR's own test asserts it's always NULL), so it's 
hard to hit in practice, but it's a latent crash for files written by engines 
that do populate `row`. (Code trace only — not reproduced e2e.)
   
   ### Minor / nits
   - `file_scanner.cpp`: the ~10-line position-deletes reader-creation block is 
byte-identical in the `FORMAT_PARQUET` and `FORMAT_ORC` arms and is 
format-independent (the reader dispatches on the delete file's own format); 
consider hoisting it once before the switch. Mapping PUFFIN → `FORMAT_PARQUET` 
in `getNativePositionDeleteFileFormat` exists only to land DV ranges on an arm 
containing that block.
   - `iceberg_meta(query_type=...)` resolves via `tbl + "$" + queryType` split 
on the **last** `$`, so `query_type='foo$snapshots'` silently resolves to the 
snapshots table and returns data instead of erroring (reproduced: `count=2`). 
Consider looking the type up directly in the supported sys-tables map.
   - `_init_deletion_vector_reader` materializes the whole roaring bitmap into 
a `std::vector<uint64_t>` up front; iterating the compressed bitmap per batch 
would bound memory for large deletion vectors.
   - `doGetPositionDeletesSystemTableSplits` iterates `planFiles()` (one split 
per delete file, no size-based split via 
`planTasks`/`TableScanUtil.splitFiles`), so a large delete file gets no 
intra-file parallelism unlike the data path.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to