Re: Problem with partitioned table creation in scala

Saulius Pakalka Fri, 27 May 2022 14:14:26 -0700

Thanks. The latest example clarifies a few things. 

Saulius Pakalka


> On 2022-05-27, at 23:27, Wing Yew Poon <wyp...@cloudera.com.invalid> wrote:
> 
> The partitionedBy typo in the doc is already fixed in the master branch of 
> the Iceberg repo.
> I filed a PR to add `using("iceberg")` to the `writeTo` examples for creating 
> a table (if you want to create an *Iceberg* table).
> 
> On Fri, May 27, 2022 at 12:58 PM Wing Yew Poon <wyp...@cloudera.com 
> <mailto:wyp...@cloudera.com>> wrote:
> One other note:
> When creating the table, you need `using("iceberg")`. The example should read
> 
> data.writeTo("prod.db.table")
>     .using("iceberg")
>     .tableProperty("write.format.default", "orc")
>     .partitionedBy($"level", days($"ts"))
>     .createOrReplace()
> 
> - Wing Yew
> 
> 
> On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon <wyp...@cloudera.com 
> <mailto:wyp...@cloudera.com>> wrote:
> That is a typo in the sample code. The doc itself 
> (https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables 
> <https://www.google.com/url?q=https://iceberg.apache.org/docs/latest/spark-writes/%23creating-tables&source=gmail-imap&ust=1654288075000000&usg=AOvVaw3Z4D71Qdmxh8BKvW1XzJrg>)
>  says:
> "Create and replace operations support table configuration methods, like 
> partitionedBy and tableProperty"
> You could also have looked up the API in Spark documentation:
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html
>  
> <https://www.google.com/url?q=https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html&source=gmail-imap&ust=1654288075000000&usg=AOvVaw1CUcrwqHqIYkfYSkjkRZ33>
> There you would have found that the method is partitionedBy, not partitionBy.
> 
> - Wing Yew
> 
> 
> On Fri, May 27, 2022 at 4:32 AM Saulius Pakalka 
> <saulius.paka...@oxylabs.io.invalid> wrote:
> Hi,
> 
> I am trying to create partitioned iceberg table using scala code below based 
> on example in docs.
> df_c.writeTo(output_table)
>   .partitionBy(days(col("last_updated")))
>   .createOrReplace()
> However, this code does not compile and throws two errors:
> 
> value partitionBy is not a member of 
> org.apache.spark.sql.DataFrameWriterV2[org.apache.spark.sql.Row]
> [error] possible cause: maybe a semicolon is missing before `value 
> partitionBy'?
> [error]       .partitionBy(days(col("last_updated")))
> [error]        ^
> [error]  not found: value days
> [error]       .partitionBy(days(col("last_updated")))
> [error]                    ^
> [error] two errors found
> 
> Not sure where to look for a problem. Any advice appreciated.
> 
> Best regards,
> 
> Saulius Pakalka
>

Re: Problem with partitioned table creation in scala

Reply via email to