I might have misunderstood the issue.
Spark indeed will repartition the data while writing, what it won't do is
write precisely 10 files inside each date partition folder sorted by col x.
Typically this kind of fine grained write config is useful if there's a
downstream consumer that will use the o
Henryk Česnolovič
08:30 (5 hours ago)
to Soumasish
Ok nvm. Seems we don't need to do repartition, as spark handles itself.
df.writeTo("some_table").partitionedBy(col("date"), col("x"), bucket(10,
col("y"))).using("iceberg").createOrReplace()
or later
df.writeTo("some_table").append()
spark unders
Henryk,
I could reproduce your issue and achieve the desired result using SQL DDL.
Here's the workaround.
package replicator
import org.apache.spark.sql.SparkSession
object Bucketing extends App {
val spark = SparkSession.builder()
.appName("ReproduceError")
.master("local[*]")
Hello.
Maybe somebody has faced the same issue. Trying to write data to the table
while using DataFrame API v2. Table is partitioned by buckets using
df.writeTo("some_table").partitionedBy(col("date"), col("x"), bucket(10,
col("y"))).using("iceberg").createOrReplace()
Can I somehow prepare df in