RussellSpitzer commented on PR #481: URL: https://github.com/apache/parquet-format/pull/481#issuecomment-2605379605
> > > @aihuaxu I am not sure why we need this type in the Variant binary encoding. Doesn't this just duplicate the `binary` type? We don't need two different ways to store binary data. > > > > > > Yeah. You are right. The storage is the same as the binary since we can't have the length in the type description. Really no need to add that. We can just use binary to store fixed(L). cc @RussellSpitzer and @emkornfield > > So I think the difference here is semantics. I don't have a strong opinion one way or another, but both Parquet natively and iceberg distinguish between bytes and Fixed(L). One could argue thisis purely for optimization purposes, and when we are paying the cost anyway of storing individual lengths per field they are equivelant. Ultimately, the place where this would make a difference is shredding where Fixed(L) can be mapped to FLBA which would save some amount of storage. > > I don't feel too strongly one way or another on adding the type. I think this is one of those cases where an engine shredding a variable length binary could decide whether the shredded type can become a fixed length when shredding. So it probably doesn't matter if untyped matter has multiple representations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org