I couldn't find it in my search - would appreciate any pointers to the
proposal or related discussions.

Russell Spitzer <russell.spit...@gmail.com> 于2025年9月11日周四 19:32写道:

> This has already been proposed as part of v4, see Edwards column metrics
> expansion proposal
>
> On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <minglei...@gmail.com> wrote:
>
>> Hi, Junwang
>>
>> We're discussing the storage of lower and upper bounds for decimal values
>> in manifest files and their compatibility after type evolution. The bounds
>> are stored as unscaled values without their original scale, so when the
>> decimal type changes, we can't correctly interpret these historical bounds
>> even though we know the current type from metadata.
>>
>> Minglei.
>>
>> Junwang Zhao <zhjw...@gmail.com> 于2025年9月11日周四 17:46写道:
>>
>>> Hi Minglei,
>>>
>>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <minglei...@gmail.com> wrote:
>>> >
>>> > Hi Ryan,
>>> >
>>> > Thank you for your detailed response. I've discussed this issue
>>> offline with my team lead, and we've done some deeper investigation into
>>> the problem. After reviewing the Decimal Type serialization code in
>>> Iceberg, we confirmed that currently only the unscaled value is serialized
>>> without storing the scale value. This indeed makes type evolution more
>>> complex than initially anticipated. Regarding your mention of v4 adopting
>>> columnar metadata for manifests, while I'm not certain which specific
>>> format Iceberg will use (perhaps Parquet?), I agree this is a positive
>>> direction. However, to properly support decimal scale evolution, I believe
>>> Iceberg would need to fundamentally change how decimal types are
>>> serialized, regardless of whether using Avro or Parquet. Specifically, we'd
>>> need to serialize both the unscaled value AND the scale, not just the
>>> unscaled value.
>>> >
>>> > Here's an example: Consider a field initially defined as DECIMAL(5,2)
>>> with value 123.45 (the serialized unscaled value is 12345). If a user later
>>> changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since (p-s)
>>> doesn't decrease - reading the old data with the new type would be
>>> problematic. Without the original scale being serialized, we can't
>>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3),
>>> potentially leading to incorrect data interpretation. By serializing the
>>> scale alongside the unscaled value, we could correctly read 12345 with
>>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data
>>> corruption.
>>>
>>> The metadata should have the data type, which includes the scale and
>>> precision, isn't that enough to describe the decimal? Correct me if
>>> I'm wrong :)
>>>
>>> >
>>> > I'd like to confirm whether this approach of serializing the scale
>>> value is something you consider viable? Or does the community have other
>>> better solutions for supporting decimal scale evolution? Also, I'm
>>> wondering if you've already discussed specific implementation approaches
>>> for decimal type changes? I'm very interested in understanding how v4 plans
>>> to address this issue.
>>> >
>>> > Minglei
>>> >
>>> > Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道:
>>> >>
>>> >> Hi Minglei, thanks for the proposal.
>>> >>
>>> >> v3 is now closed, so we can't introduce a breaking change like this
>>> until v4. We looked into decimal type evolution in v3 and found that due to
>>> the way that we currently store lower and upper bounds for decimal values,
>>> we can't safely support this in v3 Iceberg manifests. We will need to wait
>>> until v4 manifests are introduced with columnar metadata to make this
>>> change.
>>> >>
>>> >> Ryan
>>> >>
>>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Hi Iceberg Community,
>>> >>>
>>> >>> I'd like to propose extending Iceberg's type promotion rules to
>>> support DECIMAL type evolution with scale changes, aligning with the
>>> SQL:2011 standard.
>>> >>>
>>> >>> Current Limitation
>>> >>>   Currently, Iceberg only supports DECIMAL type promotion when:
>>> >>>   - Scale remains the same
>>> >>>   - Precision can be increased
>>> >>>
>>> >>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to
>>> DECIMAL(12,4).
>>> >>>
>>> >>> Proposed Change
>>> >>>   Allow DECIMAL type evolution when:
>>> >>>   1. Target scale >= source scale
>>> >>>   2. Target precision >= source precision
>>> >>>   3. Integer part capacity is preserved: (target_precision -
>>> target_scale) >= (source_precision - source_scale)
>>> >>>
>>> >>> Examples
>>> >>>   With this change:
>>> >>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 →
>>> 4)
>>> >>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2
>>> → 5)
>>> >>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would lose
>>> integer capacity)
>>> >>>
>>> >>> Rationale
>>> >>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011
>>> standard expectations
>>> >>>   2. User Experience: Many users coming from traditional databases
>>> expect this type evolution to work
>>> >>>   3. Data Safety: The proposed rules ensure no data loss - existing
>>> values can always be represented in the new
>>> >>>   type
>>> >>>   4. Real-world Use Cases: Common scenarios like adding more decimal
>>> precision for currency calculations would
>>> >>>   be supported
>>> >>>
>>> >>> Implementation
>>> >>>   I've created a proof-of-concept implementation:
>>> https://github.com/apache/iceberg/issues/14037
>>> >>>
>>> >>> Questions for Discussion
>>> >>>   1. Should this be part of the spec v3, or wait for a future
>>> version?
>>> >>>   2. Are there any backward compatibility concerns we should address?
>>> >>>
>>> >>> Looking forward to your feedback and thoughts on this proposal.
>>> >>>
>>> >>> Best regards,
>>> >>> Minglei
>>>
>>>
>>>
>>> --
>>> Regards
>>> Junwang Zhao
>>>
>>

Reply via email to