Hi all, Issue #16238 reports that hidden partition fields get different default names depending on how the partition is created, and I'd like the community's view on whether and how to reconcile this before I put up a PR.
The two paths diverge for the parameterized transforms (bucket and truncate): - Creating a table (including via Spark createOrReplace, which routes through Spark3Util.toPartitionSpec -> PartitionSpec.Builder) generates names without the parameter: col_bucket, col_trunc. - ALTER TABLE ADD PARTITION FIELD (UpdatePartitionSpec -> BaseUpdatePartitionSpec.PartitionNameGenerator) generates names with the parameter: col_bucket_<n>, col_trunc_<width>. Time-based transforms (year/month/day/hour) already agree across both paths. For reference, the partition-spec example in the spec (format/spec.md) uses the no-parameter form: "name": "id_bucket" for a bucket[16] transform. Both forms are currently asserted by tests on each side, so this looks like a deliberate-but-unreconciled difference rather than an accidental bug, and standardizing it would be a user-visible, cross-engine behavior change. I see two coherent directions: 1. Align ALTER to the creation/spec form (col_bucket). Smaller footprint and matches the spec example. The _<n> suffix in BaseUpdatePartitionSpec mainly disambiguates adding two different bucket widths to the same source column within one spec (the case pinned by testAddMultipleBuckets); collisions with previously-dropped (void) fields are already handled separately by renaming them to name_<fieldId>. So this direction could keep the bare col_bucket name and append _<n> only on an actual name conflict, preserving the multi-width case while making the common ALTER match creation. 2. Align creation to the ALTER form (col_bucket_<n>), which is the preference noted in the issue and makes the builder strictly more expressive. The downside is that it changes the default name for the most common path across every engine, deviates from the spec example, and touches a large number of existing tests. A third option is to leave the behavior as-is and document it as intended. Separately, the "add multiple widths on the same source column" behavior is currently covered by a positive test only for bucket (testAddMultipleBuckets), not for truncate, even though both behave identically. My inclination is option 1 (with the conflict-only suffix), but I don't want to pick a direction unilaterally given it touches a long-standing, cross-engine convention. Could folks weigh in on the preferred direction? Issue: https://github.com/apache/iceberg/issues/16238 Thanks, Vova Kolmakov (wombatu-kun)
