tobixdev commented on issue #14828:
URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2679679211

   So i dug a little deeper on how we could implement this functionality. 
   
   ### arrow-rs
   
   Firstly, I think we require two "flavors" of sorting - one for `arrow-ord` 
and one for `arrow-row` as DataFusion uses both of these APIs. 
   
   AFAIK, in `arrow-rs` we don't have a "user defined type" registry that we 
could leverage. So when a user calls, for example, `lexsort`, there is also no 
way to "lookup" whether we have a user defined type that can be applied to a 
particular column. Therefore, we must pass in the information on how to sort a 
column to the sorting procedure. One possible way for getting this information 
into the called functions is to extend `SortField` (`arrow-row`) and 
`SortColumn` (`arrow-ord`).
   
   I think this extension could happen in one of two ways:
   1. *Provide an Implementation*: In this approach users directly provide an 
implementation. In `arrow-row` this could be a custom byte encoder, in 
`arrow-ord` this could be a function that maps to a `Ordering`.
   2. *Provide a User Defined Type*: In this approach users provide a user 
defined type. Somehow we would then need the ability to deduce the 
implementations from 1. based on the given type.
   
   Maybe one way to go forward is implement 1. and then implement 2. if we 
think this is sensible. I am a bit unsure on how we can directly use the arrow 
extension types for this use case. If anyone has a better idea here, I'd love 
to hear your take on that (@mbrobbel maybe you have an opinion if this is 
within the scope of arrow extension types). 
   
   ### DataFusion
   
   Here I think we should go with user defined types and attach the sorting 
information to that type. This should be possible as we can have a "central 
registry" (e.g., `SessionContext`) that holds all available user defined types 
and we can look up whether this particular column has a user defined type with 
a user defined ordering. If we have a use case, we could also think of adding 
the ability to override this ordering behavior. 
   
   I don't have more details on how we could implement this. I think trying to 
implement 1. in arrow-rs is the first step, and then we should check on how we 
can connect that to user defined types in DataFusion.
   
   If this sounds good, I can start to work on a draft. Maybe lmk if you think 
that extending `SortField` and `SortColumn` is a reasonable approach here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to