924060929 commented on PR #65135:
URL: https://github.com/apache/doris/pull/65135#issuecomment-4875160197
Thanks for the work — the native `$position_deletes` path is a nice
improvement over the JNI-only approach. I built the branch and ran it against a
live Iceberg REST catalog (Spark 4.0 / iceberg-rest 1.10 / MinIO); the happy
path (single-spec partitioned/unpartitioned, v2 & v3) works correctly. A few
issues worth addressing, ordered by severity.
### 1. (High) Kerberos / `hadoop.username` auth is silently dropped for all
Iceberg JNI sys-table reads
The refactor in `IcebergSysTableJniScanner` now passes the **raw** JNI
params map to `PreExecutionAuthenticatorCache.getAuthenticator(...)`, dropping
the old `HADOOP_OPTION_PREFIX` filter+strip. But BE
`iceberg_sys_table_jni_reader.cpp` (unchanged) prefixes **every** catalog
property with `hadoop.`:
```cpp
params[HADOOP_OPTION_PREFIX + kv.first] = kv.second; //
hadoop.security.authentication -> hadoop.hadoop.security.authentication
```
So `AuthenticationConfig` looks up the unprefixed keys
(`hadoop.security.authentication`, `hadoop.kerberos.principal/keytab`,
`hadoop.username`) and finds only the double-prefixed `hadoop.hadoop.*`,
silently falling back to `SimpleAuthenticationConfig(user="hadoop")`.
`PaimonJniScanner` still strips the prefix, i.e. this is the prior convention.
Impact: on a kerberized catalog, `select * from tbl$snapshots` (or any JNI
sys table) would read manifests without Kerberos; simple-auth catalogs with a
custom `hadoop.username` read as `hadoop`; the authenticator cache key also
collapses to a constant.
Verified by running the real classes:
-
`AuthenticationConfig.getKerberosConfig({hadoop.hadoop.security.authentication=kerberos,
...})` → `SimpleAuthenticationConfig`
-
`AuthenticationConfig.getKerberosConfig({hadoop.security.authentication=kerberos,
...})` (prefix stripped) → `KerberosAuthenticationConfig`
(Confirmed at the auth-config selection level; I didn't stand up a full
kerberized cluster.) Suggest restoring the `hadoop.`-prefix filter/strip before
`getAuthenticator`.
### 2. (Medium) AVRO position-delete files throw an uncaught
`UnsupportedOperationException`
`getNativePositionDeleteFileFormat` throws `UnsupportedOperationException`
for `FileFormat.AVRO`, and `doGetPositionDeletesSystemTableSplits` only catches
`IOException`, so it escapes split planning as an internal error. Avro position
deletes are legal per the Iceberg spec (`write.delete.format.default=avro`).
Reproduced on the built branch:
```
select * from tbl$position_deletes;
ERROR 1105 (HY000): errCode = 2, detailMessage =
UnsupportedOperationException: Unsupported Iceberg position delete file format:
AVRO
```
(same via `iceberg_meta(...)`). Suggest a clean `UserException`/unsupported
message, or routing avro through a supported reader.
### 3. (Medium) `partition` column is silently `NULL` after partition
evolution or for NULL partition values
`getPartitionDataObjectJson` builds the JSON from the delete file's **own**
spec fields, and `GsonUtils.GSON` (no `serializeNulls`) drops null values — so
the JSON can carry fewer fields than the sys table's `partition` column, which
is the **union** partition type across all specs
(`Partitioning.partitionType`). The BE struct serde then yields a whole-`NULL`
partition (no error).
Reproduced on the built branch, all `struct<a,b>`:
| delete file partition | FE JSON | Doris `partition` |
|---|---|---|
| (a=p1, b=q1) | `{"a":"p1","b":"q1"}` | `{"a":"p1", "b":"q1"}` ✅ |
| (a=p1, b=NULL) | `{"a":"p1"}` (b dropped) | `NULL` ❌ |
| spec0 file after `ALTER TABLE ADD PARTITION FIELD id` | `{"dt":...}` |
`NULL` ❌ |
The query succeeds but the partition column silently loses data — any table
that has undergone partition evolution shows `NULL` partitions for delete files
written under older specs. This isn't covered by the "Preserve typed partition
values" commit (that fixed value typing, not the struct shape). Suggest
emitting the partition in the table's unified partition type (coerce
`PartitionData`, as Iceberg's own metadata scan does) and enabling null
serialization so the field count always matches the schema.
### 4. (Latent crash, low reachability) optional `row` column + schema
evolution
When a position-delete file actually contains the optional `row` struct and
the table schema later evolves, the standalone `ParquetReader` init (no
`column_descs` → `on_before_init_reader` skipped → `ConstNode`
`table_info_node` whose `children_column_exists()` is always `true`) can
default-insert and dereference a null child reader in
`StructColumnReader::read_column_data` → SIGSEGV. Mainstream Spark MoR deletes
don't populate `row` (this PR's own test asserts it's always NULL), so it's
hard to hit in practice, but it's a latent crash for files written by engines
that do populate `row`. (Code trace only — not reproduced e2e.)
### Minor / nits
- `file_scanner.cpp`: the ~10-line position-deletes reader-creation block is
byte-identical in the `FORMAT_PARQUET` and `FORMAT_ORC` arms and is
format-independent (the reader dispatches on the delete file's own format);
consider hoisting it once before the switch. Mapping PUFFIN → `FORMAT_PARQUET`
in `getNativePositionDeleteFileFormat` exists only to land DV ranges on an arm
containing that block.
- `iceberg_meta(query_type=...)` resolves via `tbl + "$" + queryType` split
on the **last** `$`, so `query_type='foo$snapshots'` silently resolves to the
snapshots table and returns data instead of erroring (reproduced: `count=2`).
Consider looking the type up directly in the supported sys-tables map.
- `_init_deletion_vector_reader` materializes the whole roaring bitmap into
a `std::vector<uint64_t>` up front; iterating the compressed bitmap per batch
would bound memory for large deletion vectors.
- `doGetPositionDeletesSystemTableSplits` iterates `planFiles()` (one split
per delete file, no size-based split via
`planTasks`/`TableScanUtil.splitFiles`), so a large delete file gets no
intra-file parallelism unlike the data path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]