I couldn't find it in my search - would appreciate any pointers to the proposal or related discussions.
Russell Spitzer <russell.spit...@gmail.com> 于2025年9月11日周四 19:32写道: > This has already been proposed as part of v4, see Edwards column metrics > expansion proposal > > On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <minglei...@gmail.com> wrote: > >> Hi, Junwang >> >> We're discussing the storage of lower and upper bounds for decimal values >> in manifest files and their compatibility after type evolution. The bounds >> are stored as unscaled values without their original scale, so when the >> decimal type changes, we can't correctly interpret these historical bounds >> even though we know the current type from metadata. >> >> Minglei. >> >> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道: >> >>> Hi Minglei, >>> >>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com> wrote: >>> > >>> > Hi Ryan, >>> > >>> > Thank you for your detailed response. I've discussed this issue >>> offline with my team lead, and we've done some deeper investigation into >>> the problem. After reviewing the Decimal Type serialization code in >>> Iceberg, we confirmed that currently only the unscaled value is serialized >>> without storing the scale value. This indeed makes type evolution more >>> complex than initially anticipated. Regarding your mention of v4 adopting >>> columnar metadata for manifests, while I'm not certain which specific >>> format Iceberg will use (perhaps Parquet?), I agree this is a positive >>> direction. However, to properly support decimal scale evolution, I believe >>> Iceberg would need to fundamentally change how decimal types are >>> serialized, regardless of whether using Avro or Parquet. Specifically, we'd >>> need to serialize both the unscaled value AND the scale, not just the >>> unscaled value. >>> > >>> > Here's an example: Consider a field initially defined as DECIMAL(5,2) >>> with value 123.45 (the serialized unscaled value is 12345). If a user later >>> changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since (p-s) >>> doesn't decrease - reading the old data with the new type would be >>> problematic. Without the original scale being serialized, we can't >>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3), >>> potentially leading to incorrect data interpretation. By serializing the >>> scale alongside the unscaled value, we could correctly read 12345 with >>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data >>> corruption. >>> >>> The metadata should have the data type, which includes the scale and >>> precision, isn't that enough to describe the decimal? Correct me if >>> I'm wrong :) >>> >>> > >>> > I'd like to confirm whether this approach of serializing the scale >>> value is something you consider viable? Or does the community have other >>> better solutions for supporting decimal scale evolution? Also, I'm >>> wondering if you've already discussed specific implementation approaches >>> for decimal type changes? I'm very interested in understanding how v4 plans >>> to address this issue. >>> > >>> > Minglei >>> > >>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道: >>> >> >>> >> Hi Minglei, thanks for the proposal. >>> >> >>> >> v3 is now closed, so we can't introduce a breaking change like this >>> until v4. We looked into decimal type evolution in v3 and found that due to >>> the way that we currently store lower and upper bounds for decimal values, >>> we can't safely support this in v3 Iceberg manifests. We will need to wait >>> until v4 manifests are introduced with columnar metadata to make this >>> change. >>> >> >>> >> Ryan >>> >> >>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com> >>> wrote: >>> >>> >>> >>> Hi Iceberg Community, >>> >>> >>> >>> I'd like to propose extending Iceberg's type promotion rules to >>> support DECIMAL type evolution with scale changes, aligning with the >>> SQL:2011 standard. >>> >>> >>> >>> Current Limitation >>> >>> Currently, Iceberg only supports DECIMAL type promotion when: >>> >>> - Scale remains the same >>> >>> - Precision can be increased >>> >>> >>> >>> This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to >>> DECIMAL(12,4). >>> >>> >>> >>> Proposed Change >>> >>> Allow DECIMAL type evolution when: >>> >>> 1. Target scale >= source scale >>> >>> 2. Target precision >= source precision >>> >>> 3. Integer part capacity is preserved: (target_precision - >>> target_scale) >= (source_precision - source_scale) >>> >>> >>> >>> Examples >>> >>> With this change: >>> >>> - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 → >>> 4) >>> >>> - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2 >>> → 5) >>> >>> - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would lose >>> integer capacity) >>> >>> >>> >>> Rationale >>> >>> 1. SQL:2011 Compliance: This behavior aligns with SQL:2011 >>> standard expectations >>> >>> 2. User Experience: Many users coming from traditional databases >>> expect this type evolution to work >>> >>> 3. Data Safety: The proposed rules ensure no data loss - existing >>> values can always be represented in the new >>> >>> type >>> >>> 4. Real-world Use Cases: Common scenarios like adding more decimal >>> precision for currency calculations would >>> >>> be supported >>> >>> >>> >>> Implementation >>> >>> I've created a proof-of-concept implementation: >>> https://github.com/apache/iceberg/issues/14037 >>> >>> >>> >>> Questions for Discussion >>> >>> 1. Should this be part of the spec v3, or wait for a future >>> version? >>> >>> 2. Are there any backward compatibility concerns we should address? >>> >>> >>> >>> Looking forward to your feedback and thoughts on this proposal. >>> >>> >>> >>> Best regards, >>> >>> Minglei >>> >>> >>> >>> -- >>> Regards >>> Junwang Zhao >>> >>