Thanks for bringing this up Brian! In my view, the decision tree for this would look something like:
1.) Is there anything incorrect with supporting truncate on the basis of width for binary columns? I can't really think of any reason, it seems legitimate to me (handling characters outside of utf-8 comes to mind) and I see engines like Spark already handle such a transform (for Spark I see TestSparkTruncateFunction#testTruncateBinary already validating the behavior). If there is something problematic with supporting this then I think we'd want to consider deprecating that behavior but again I don't see anything problematic since the transform is well defined. 2.) If there's nothing incorrect, then changing the spec makes sense. I left a review on the PR, the only thing is with a spec we want to be explicit so we should specify that truncate on binary will take [0, width) bytes (essentially the same as the string case). Right now, I'm favoring 2 but I think it's worth leaving this open for a bit in case others see an issue with supporting this behavior. Thanks, Amogh Jahagirdar On Wed, Apr 3, 2024 at 6:45 PM Brian Hulette <bhule...@apache.org> wrote: > Hello, this is my first time writing on this list so I'll introduce > myself. I'm Brian Hulette, I've been involved with a couple of Apache > projects in the past (Arrow and Beam), and now I'm working on BigQuery's > support for Iceberg. > > My colleague raised an issue [1] a while ago about a discrepancy between > the specification and the implementation - truncate is not supposed to work > on binary column, but it looks like it does. It seems unlikely that we > will drop support for something that is working now, so I figured we should > document the current behavior instead. I drafted a PR for this [2], could > someone help review? > > Thanks! > Brian > > [1] https://github.com/apache/iceberg/issues/5251 > [2] https://github.com/apache/iceberg/pull/10079 >