Hi Ryan,

Thank you for your detailed response. I've discussed this issue offline
with my team lead, and we've done some deeper investigation into the
problem. After reviewing the Decimal Type serialization code in Iceberg, we
confirmed that currently only the unscaled value is serialized without
storing the scale value. This indeed makes type evolution more complex than
initially anticipated. Regarding your mention of v4 adopting columnar
metadata for manifests, while I'm not certain which specific format Iceberg
will use (perhaps Parquet?), I agree this is a positive direction. However,
to properly support decimal scale evolution, I believe Iceberg would need
to fundamentally change how decimal types are serialized, regardless of
whether using Avro or Parquet. Specifically, we'd need to serialize both
the unscaled value AND the scale, not just the unscaled value.

Here's an example: Consider a field initially defined as DECIMAL(5,2) with
value 123.45 (the serialized unscaled value is 12345). If a user later
changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since (p-s)
doesn't decrease - reading the old data with the new type would be
problematic. Without the original scale being serialized, we can't
distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3),
potentially leading to incorrect data interpretation. By serializing the
scale alongside the unscaled value, we could correctly read 12345 with
scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data
corruption.

I'd like to confirm whether this approach of serializing the scale value is
something you consider viable? Or does the community have other better
solutions for supporting decimal scale evolution? Also, I'm wondering if
you've already discussed specific implementation approaches for decimal
type changes? I'm very interested in understanding how v4 plans to address
this issue.

Minglei

Ryan Blue <rdb...@gmail.com> 于2025年9月11日周四 03:53写道:

> Hi Minglei, thanks for the proposal.
>
> v3 is now closed, so we can't introduce a breaking change like this until
> v4. We looked into decimal type evolution in v3 and found that due to the
> way that we currently store lower and upper bounds for decimal values, we
> can't safely support this in v3 Iceberg manifests. We will need to wait
> until v4 manifests are introduced with columnar metadata to make this
> change.
>
> Ryan
>
> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <minglei...@gmail.com> wrote:
>
>> Hi Iceberg Community,
>>
>> I'd like to propose extending Iceberg's type promotion rules to support
>> DECIMAL type evolution with scale changes, aligning with the SQL:2011
>> standard.
>>
>> *Current Limitation*
>>   Currently, Iceberg only supports DECIMAL type promotion when:
>>   - Scale remains the same
>>   - Precision can be increased
>>
>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to
>> DECIMAL(12,4).
>>
>> *Proposed Change*
>>   Allow DECIMAL type evolution when:
>>   1. Target scale >= source scale
>>   2. Target precision >= source precision
>>   3. Integer part capacity is preserved: (target_precision -
>> target_scale) >= (source_precision - source_scale)
>>
>> *Examples*
>>   With this change:
>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 → 4)
>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2 → 5)
>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would lose
>> integer capacity)
>>
>> *Rationale*
>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011 standard
>> expectations
>>   2. User Experience: Many users coming from traditional databases expect
>> this type evolution to work
>>   3. Data Safety: The proposed rules ensure no data loss - existing
>> values can always be represented in the new
>>   type
>>   4. Real-world Use Cases: Common scenarios like adding more decimal
>> precision for currency calculations would
>>   be supported
>>
>> *Implementation*
>>   I've created a proof-of-concept implementation:
>> https://github.com/apache/iceberg/issues/14037
>>
>> *Questions for Discussion*
>>   1. Should this be part of the spec v3, or wait for a future version?
>>   2. Are there any backward compatibility concerns we should address?
>>
>> Looking forward to your feedback and thoughts on this proposal.
>>
>> Best regards,
>> Minglei
>>
>

Reply via email to