Re: truncate transform over binary columns

Amogh Jahagirdar Wed, 03 Apr 2024 21:20:30 -0700

Another aspect that just occurred to me, there could be concern on just
directly updating the spec as is since it's ultimately the source of truth
(not the reference implementation). If there's concern that such a change
would make other implementations technically non-compliant because they
don't support truncate on binary, then another option is to say that
truncate binary is only officially supported as part of the v3 spec (with a
note that technically in the reference implementation it works prior to
v3). This feels a bit overkill to me for this case since if I look at the
history of this, it's been supported since the very beginning of the
reference implementation and I think it was a miss on the original spec
definition. However, if people are concerned about directly updating the
spec as is, then I think this is another option.


Thanks,

Amogh Jahagirdar

On Wed, Apr 3, 2024 at 9:36 PM Amogh Jahagirdar <am...@tabular.io> wrote:

> Thanks for bringing this up Brian! In my view, the decision tree for this
> would look something like:
>
> 1.) Is there anything incorrect with supporting truncate on the basis of
> width for binary columns? I can't really think of any reason, it seems
> legitimate to me (handling characters outside of utf-8 comes to mind) and I
> see engines like Spark already handle such a transform (for Spark I see
> TestSparkTruncateFunction#testTruncateBinary already validating the
> behavior). If there is something problematic with supporting this then I
> think we'd want to consider deprecating that behavior but again I don't see
> anything problematic since the transform is well defined.
>
> 2.) If there's nothing incorrect, then changing the spec makes sense. I
> left a review on the PR, the only thing is with a spec we want to be
> explicit so we should specify that truncate on binary will take [0, width)
> bytes (essentially the same as the string case).
>
> Right now, I'm favoring 2 but  I think it's worth leaving this open for a
> bit in case others see an issue with  supporting this behavior.
>
> Thanks,
>
> Amogh Jahagirdar
>
> On Wed, Apr 3, 2024 at 6:45 PM Brian Hulette <bhule...@apache.org> wrote:
>
>> Hello, this is my first time writing on this list so I'll  introduce
>> myself. I'm Brian Hulette, I've been involved with a couple of Apache
>> projects in the past (Arrow and Beam), and now I'm working on BigQuery's
>> support for Iceberg.
>>
>> My colleague raised an issue [1] a while ago about a discrepancy between
>> the specification and the implementation - truncate is not supposed to work
>> on binary column, but it looks like it does.  It seems unlikely that we
>> will drop support for something that is working now, so I figured we should
>> document the current behavior instead. I drafted a PR for this [2], could
>> someone help review?
>>
>> Thanks!
>> Brian
>>
>> [1] https://github.com/apache/iceberg/issues/5251
>> [2] https://github.com/apache/iceberg/pull/10079
>>
>

Re: truncate transform over binary columns

Reply via email to