tobixdev commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2679679211
So i dug a little deeper on how we could implement this functionality. ### arrow-rs Firstly, I think we require two "flavors" of sorting - one for `arrow-ord` and one for `arrow-row` as DataFusion uses both of these APIs. AFAIK, in `arrow-rs` we don't have a "user defined type" registry that we could leverage. So when a user calls, for example, `lexsort`, there is also no way to "lookup" whether we have a user defined type that can be applied to a particular column. Therefore, we must pass in the information on how to sort a column to the sorting procedure. One possible way for getting this information into the called functions is to extend `SortField` (`arrow-row`) and `SortColumn` (`arrow-ord`). I think this extension could happen in one of two ways: 1. *Provide an Implementation*: In this approach users directly provide an implementation. In `arrow-row` this could be a custom byte encoder, in `arrow-ord` this could be a function that maps to a `Ordering`. 2. *Provide a User Defined Type*: In this approach users provide a user defined type. Somehow we would then need the ability to deduce the implementations from 1. based on the given type. Maybe one way to go forward is implement 1. and then implement 2. if we think this is sensible. I am a bit unsure on how we can directly use the arrow extension types for this use case. If anyone has a better idea here, I'd love to hear your take on that (@mbrobbel maybe you have an opinion if this is within the scope of arrow extension types). ### DataFusion Here I think we should go with user defined types and attach the sorting information to that type. This should be possible as we can have a "central registry" (e.g., `SessionContext`) that holds all available user defined types and we can look up whether this particular column has a user defined type with a user defined ordering. If we have a use case, we could also think of adding the ability to override this ordering behavior. I don't have more details on how we could implement this. I think trying to implement 1. in arrow-rs is the first step, and then we should check on how we can connect that to user defined types in DataFusion. If this sounds good, I can start to work on a draft. Maybe lmk if you think that extending `SortField` and `SortColumn` is a reasonable approach here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org