Hi Team, Need some help regarding our proposal for a partition-stats file within each snapshot. With each snapshot we are proposing a partition-stats avro file that contains information about all partitions in the table. So the schema we decide to have is *(partition_spec_id(int), partition(PartitionData), file_count(int), row_count(long)).* Problem is with the 2nd *column(partition)*. When partition evolution happens, the schema for PartitionData(PartitionSpec) will change. illustration :
{"partition_spec_id":0,"partition":"PartitionData{data=a}","file_count":2,"row_count":2} {"partition_spec_id":0,"partition":"PartitionData{data=b}","file_count":1,"row_count":1} {"partition_spec_id":1,"partition":"PartitionData{data=c, id=1}","file_count":1,"row_count":1} And this will be a problem for reader and writer. We decided to have *the partition column as a "String type" and serialize PartitionData to string.* Here we want to confirm that "*Can all data types supported in iceberg can serialize to String"?* For example if a column in a table has binary type and we have a partition on it. can it be serialize to string? issue link : https://github.com/apache/iceberg/issues/1832 <https://github.com/apache/iceberg/issues/1832> thanks and regards -- Piyush Hurpade Software Engineer piyush.hurp...@dremio.com