Hi Scott, There are some docs to help with this situation: https://iceberg.apache.org/spark/#writing-against-partitioned-table
We added a helper function, IcebergSpark.registerBucketUDF, to register the UDF that you need for the bucket column. That's probably the source of the problem. I always recommend an orderBy with the partition expressions to write. Spark seems to do best when it produces a global ordering. rb On Fri, Nov 20, 2020 at 2:40 PM Kruger, Scott <sckru...@paypal.com.invalid> wrote: > I want to have a table that’s partitioned by the following, in order: > > > > - Low-cardinality identity > - Day > - Bucketed long ID, 16 buckets > > > > Is this possible? If so, how should I do the dataframe write? This is what > I’ve tried so far: > > > > 1. df.orderBy(“identity”, > “day”).sortWithinPartitions(expr(“iceberg_bucket16(id)”)) > 2. df.orderBy(“identity”, “day”, expr(“iceberg_bucket16(id)”)) > 3. df.repartition(“identity”, “day”).sortWithinPartitions(“identity”, > “day”, expr(“iceberg_bucket16(id)”)) > 4. df.repartition(“identity”, “day”, > “id”).sortWithinPartitions(“identity”, “day”, expr(“iceberg_bucket16(id)”)) > 5. df.repartitionByRange(“identity”, > “day”).sortWithinPartitions(“identity”, “day”, > expr(“iceberg_bucket16(id)”)) > 6. df.repartitionByRange(“identity”, “day”, > “id”).sortWithinPartitions(“identity”, “day”, expr(“iceberg_bucket16(id)”)) > > > > But I keep getting the error indicating that a partition has already been > closed. > -- Ryan Blue Software Engineer Netflix