jayzhan211 commented on PR #14392: URL: https://github.com/apache/datafusion/pull/14392#issuecomment-2629310335
> > In optimizer, we rely on the name to do such optimization so if we rename it to name like 'spark_count' we might need to add the spark name to those optimize rules as well, which increase the maintainence cost. If we assume the datafusion native function and spark function is mutually exclusive (I guess we do so) then having consistent name for optimizer is preferred choice. > > I'm glad you brought this up, @jayzhan211. > > Some Spark functions behave identically to DataFusion functions but have different names. For example: > - Spark’s `startswith(str, substr)` corresponds to DataFusion’s `expr_fn::starts_with(str, substr)` > > There are also cases where functions take input arguments in a different order. For example: > - Spark’s `position(substr, str)` corresponds to DataFusion’s `expr_fn::strpos(str, substr)` If the behavior is the same we don't have any reason to copy another one to spark crate, adding alias to the function is enough. If the diff between the native and spark one is minor we can also add another flag to switch code logic instead of maintaining a copy in another crate so that the maintenance cost could be minimized. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org