Re: Bucket partitioning in addition to regular partitioning

2020-11-24 Thread Ryan Blue
*Date: *Tuesday, November 24, 2020 at 11:47 AM > *To: *"Kruger, Scott" > *Cc: *"dev@iceberg.apache.org" > *Subject: *Re: Bucket partitioning in addition to regular partitioning > > > > This message contains hyperlinks, take precaution before opening the

Re: Bucket partitioning in addition to regular partitioning

2020-11-24 Thread Kruger, Scott
misunderstanding what you’re saying? FWIW this is spark 2.4.x with Iceberg 0.10.0 using the dataframe API. From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Tuesday, November 24, 2020 at 11:47 AM To: "Kruger, Scott" Cc: "dev@iceberg.apache.org" Subject: Re: Bucket parti

Re: Bucket partitioning in addition to regular partitioning

2020-11-24 Thread Ryan Blue
only partition by the bucketed ID). > > > > *From: *Ryan Blue > *Reply-To: *"dev@iceberg.apache.org" , " > rb...@netflix.com" > *Date: *Friday, November 20, 2020 at 8:11 PM > *To: *Iceberg Dev List > *Subject: *Re: Bucket partitioning in addition

Re: Bucket partitioning in addition to regular partitioning

2020-11-24 Thread Kruger, Scott
trouble with (I can get things to work just fine if I follow the docs and only partition by the bucketed ID). From: Ryan Blue Reply-To: "dev@iceberg.apache.org" , "rb...@netflix.com" Date: Friday, November 20, 2020 at 8:11 PM To: Iceberg Dev List Subject: Re: Bucket partiti

Re: Bucket partitioning in addition to regular partitioning

2020-11-20 Thread Ryan Blue
Hi Scott, There are some docs to help with this situation: https://iceberg.apache.org/spark/#writing-against-partitioned-table We added a helper function, IcebergSpark.registerBucketUDF, to register the UDF that you need for the bucket column. That's probably the source of the problem. I always

Bucket partitioning in addition to regular partitioning

2020-11-20 Thread Kruger, Scott
I want to have a table that’s partitioned by the following, in order: * Low-cardinality identity * Day * Bucketed long ID, 16 buckets Is this possible? If so, how should I do the dataframe write? This is what I’ve tried so far: 1. df.orderBy(“identity”, “day”).sortWithinPartit