Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

via GitHub Sun, 02 Feb 2025 01:12:26 -0800


jayzhan211 commented on PR #14392:
URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629310335


   > > In optimizer, we rely on the name to do such optimization so if we 
rename it to name like 'spark_count' we might need to add the spark name to 
those optimize rules as well, which increase the maintainence cost. If we 
assume the datafusion native function and spark function is mutually exclusive 
(I guess we do so) then having consistent name for optimizer is preferred 
choice.
   > 
   > I'm glad you brought this up, @jayzhan211. 
   > 
   > Some Spark functions behave identically to DataFusion functions but have 
different names. For example:
   > - Spark’s `startswith(str, substr)` corresponds to DataFusion’s 
`expr_fn::starts_with(str, substr)`
   > 
   > There are also cases where functions take input arguments in a different 
order. For example:
   > - Spark’s `position(substr, str)` corresponds to DataFusion’s 
`expr_fn::strpos(str, substr)`
   
   If the behavior is the same we don't have any reason to copy another one to 
spark crate, adding alias to the function is enough. 
   
   If the diff between the native and spark one is minor we can also add 
another flag to switch code logic instead of maintaining a copy in another 
crate so that the maintenance cost could be minimized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

Reply via email to