date:20241201

Re: repartition before writing to table with bucketed partitioning

2024-12-01 Thread Soumasish

I might have misunderstood the issue. Spark indeed will repartition the data while writing, what it won't do is write precisely 10 files inside each date partition folder sorted by col x. Typically this kind of fine grained write config is useful if there's a downstream consumer that will use the o

Re: repartition before writing to table with bucketed partitioning

2024-12-01 Thread Henryk Česnolovič

Henryk Česnolovič 08:30 (5 hours ago) to Soumasish Ok nvm. Seems we don't need to do repartition, as spark handles itself. df.writeTo("some_table").partitionedBy(col("date"), col("x"), bucket(10, col("y"))).using("iceberg").createOrReplace() or later df.writeTo("some_table").append() spark unders