Hi Piyush, You might want to consider having a separate partition stats file for each partition spec. That way, each stats file contains just one partition struct type and you can keep the struct unmodified. There is a way to convert a partition struct to a string (PartitionSpec.partitionToPath) but that is a one-way conversion and you shouldn't try to parse that string or consider two partitions equal just because the string is equal.
Making the file specific to a partition spec fixes the problem and allows you to find the data in each file using the same partition predicates that we use to locate data files. That will make it easy to find the stats for the partitions that you're looking for based on some data or partition query filter. rb On Thu, Jan 7, 2021 at 11:15 PM Piyush Vinay Hurpade < piyush.hurp...@dremio.com> wrote: > Hi Team, > Need some help regarding our proposal for a partition-stats file within > each snapshot. With each snapshot we are proposing a partition-stats avro > file that contains information about all partitions in the table. So the > schema we decide to have is *(partition_spec_id(int), > partition(PartitionData), file_count(int), row_count(long)).* Problem is > with the 2nd *column(partition)*. When partition evolution happens, the > schema for PartitionData(PartitionSpec) will change. illustration : > > {"partition_spec_id":0,"partition":"PartitionData{data=a}","file_count":2,"row_count":2} > {"partition_spec_id":0,"partition":"PartitionData{data=b}","file_count":1,"row_count":1} > {"partition_spec_id":1,"partition":"PartitionData{data=c, > id=1}","file_count":1,"row_count":1} > > And this will be a problem for reader and writer. We decided to have *the > partition column as a "String type" and serialize PartitionData to string.* > Here we want to confirm that "*Can all data types supported in iceberg > can serialize to String"?* For example if a column in a table has binary > type and we have a partition on it. can it be serialize to string? > issue link : https://github.com/apache/iceberg/issues/1832 > <https://github.com/apache/iceberg/issues/1832> > > thanks and regards > -- > > Piyush Hurpade > > Software Engineer > > piyush.hurp...@dremio.com > > > > -- Ryan Blue Software Engineer Netflix