Re: Re: Partition column order in rewrite manifests

2024-02-01 Thread Jack Ye
Just created https://github.com/apache/iceberg/issues/9615 to track this. -Jack On Thu, Feb 1, 2024 at 11:43 AM Zach Dischner wrote: > That is a great idea! Would let anyone fine tune the sorting and > colocation behaviors they want for the metadata tree >

RE: Re: Partition column order in rewrite manifests

2024-02-01 Thread Zach Dischner
That is a great idea! Would let anyone fine tune the sorting and colocation behaviors they want for the metadata tree

Re: Partition column order in rewrite manifests

2024-01-31 Thread Jack Ye
Yeah I was also thinking about potentially exposing something more flexible. However, I don't think we can directly expose the logic for users to manipulate the data frame directly, because we want the RewriteManifests core API to be not engine-specific. In addition, I think we still need to expos

RE: Partition column order in rewrite manifests

2024-01-31 Thread Zach Dischner
I love this idea. Instead of (or in addition to) inferring the desired sort order, I would propose that the ability for the user to define their own sorting/partitioning be exposed. That way the user could balance the metadata tree more specifically to their use case. Rough thinking - https://g

RE: Partition column order in rewrite manifests

2024-01-31 Thread Zach Dischner
I love this idea. Instead of (or in addition to) inferring the desired sort order, I would propose that the ability for the user to define their own sorting/partitioning be exposed. That way the user could balance the metadata tree more specifically to their use case. Rough thinking - https://gith

Re: Partition column order in rewrite manifests

2024-01-30 Thread Jack Ye
Yes, it is sufficient at least for the use case I am talking about. -Jack On Tue, Jan 30, 2024 at 7:46 PM Renjie Liu wrote: > To be more specific, I think it's sorting by the value after > transformation? > > On Wed, Jan 31, 2024 at 11:36 AM Amogh Jahagirdar > wrote: > >> Yeah I think being ab

Re: Partition column order in rewrite manifests

2024-01-30 Thread Renjie Liu
To be more specific, I think it's sorting by the value after transformation? On Wed, Jan 31, 2024 at 11:36 AM Amogh Jahagirdar wrote: > Yeah I think being able to specify the order of the columns to sort by > when rewriting the manifests makes a lot of sense. > > On Tue, Jan 30, 2024 at 5:47 PM

Re: Partition column order in rewrite manifests

2024-01-30 Thread Amogh Jahagirdar
Yeah I think being able to specify the order of the columns to sort by when rewriting the manifests makes a lot of sense. On Tue, Jan 30, 2024 at 5:47 PM Renjie Liu wrote: > Sounds reasonable to me. > > On Wed, Jan 31, 2024 at 7:56 AM wrote: > >> Sounds like a reasonable thing to add? Maybe we

Re: Partition column order in rewrite manifests

2024-01-30 Thread Renjie Liu
Sounds reasonable to me. On Wed, Jan 31, 2024 at 7:56 AM wrote: > Sounds like a reasonable thing to add? Maybe we could check cardinality to > pick out the default order as well? > Sent from my iPhone > > On Jan 30, 2024, at 3:50 PM, Jack Ye wrote: > >  > Hi everyone, > > Today, the rewrite ma

Re: Partition column order in rewrite manifests

2024-01-30 Thread russell . spitzer
Sounds like a reasonable thing to add? Maybe we could check cardinality to pick out the default order as well?Sent from my iPhoneOn Jan 30, 2024, at 3:50 PM, Jack Ye wrote:Hi everyone,Today, the rewrite manifest procedure always orders the data files based on their data_file.partition value. Spec