repartition before writing to table with bucketed partitioning

Henryk Česnolovič Fri, 29 Nov 2024 04:38:55 -0800

Hello.

Maybe somebody has faced the same issue. Trying to write data to the table
while using DataFrame API v2. Table is partitioned by buckets using
df.writeTo("some_table").partitionedBy(col("date"), col("x"), bucket(10,
col("y"))).using("iceberg").createOrReplace()
 Can I somehow prepare df in terms of partitions before writing to
destination to not to write too many files? Raw data is not grouped by
keys. Expectations are like
df.repartition(col("x"), bucket(10,
col("y")).writeTo("some_table").partitionedBy(col("date"), col("x"),
bucket(10, col("y"))).using("iceberg").createOrReplace() .
bucket function can't be used in that way, because getting [INTERNAL_ERROR]
Cannot generate code for expression: bucket(10, input[0, bigint, true])


Thanks

repartition before writing to table with bucketed partitioning

Reply via email to