I might have misunderstood the issue.
Spark indeed will repartition the data while writing, what it won't do is
write precisely 10 files inside each date partition folder sorted by col x.
Typically this kind of fine grained write config is useful if there's a
downstream consumer that will use the o
Henryk Česnolovič
08:30 (5 hours ago)
to Soumasish
Ok nvm. Seems we don't need to do repartition, as spark handles itself.
df.writeTo("some_table").partitionedBy(col("date"), col("x"), bucket(10,
col("y"))).using("iceberg").createOrReplace()
or later
df.writeTo("some_table").append()
spark unders