jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626205413
> > > @jayzhan211 and @shehabgamin what is the status of this PR? It looks to me like there are some unresolved comments > > > It looks like there are some unresolved comments like [#14268 (comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432) from @jayzhan211 that suggest this PR is being too aggressive in coercion > > > I would like to try and get this in for the fhe DataFusion 45 release so it would be nice if we can figure out how to fix the regressions > > > > There are concerns unresolved > > @alamb and @jayzhan211 - Thank you for your patience and thoughtful discussion. Sorry for being quiet here, I’ve spent time reflecting on how to reconcile the competing priorities over the past couple of days. > > Let me clarify the core principles driving this PR. > > This PR aims to restore DataFusion's documented type coercion contracts - specifically the semantics defined for `TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging the existing casting logic in `NativeType::default_cast_for`. > > The current implementation strictly follows the guidance in: https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134 where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type castable to the target `NativeType` through the explicit set of type conversion rules defined in `NativeType::default_cast_for`. > > I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, there’s a critical distinction: **System contracts vs. SQL dialect behavior.** > > While PostgreSQL/DuckDB-style strictness is desirable for specific use cases, subverting the type system’s foundational contracts would result in: > > 1. **Lost flexibility:** Users lose the ability to intentionally design UDFs with broader NativeType coercion patterns when needed. The system becomes opinionated in ways that aren't transparent. > 2. **Hidden couplings:** Tightly linking coercion rules to individual function implementations makes behavior unpredictable across optimizer rules. What works for `ascii()` might break `concat()` unexpectedly. > 3. **Eroded system integrity:** Undocumented exceptions to `NativeType::default_cast_for` in `Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion context-dependent and impossible to maintain and reason about holistically. > > The current approach keeps the door open for both strict and permissive use cases through explicit signature choices. With these changes, users can still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures while still leveraging broader coercion where appropriate. Conversely, hardcoding dialect-specific restrictions at the type system level would limit DataFusion’s flexibility and usability as a polyglot execution engine. > > If we bypass these contracts to enforce function-specific rules then we: > > 1. **Break documented behavior:** The system no longer behaves as its APIs and documentation promise, creating confusion. > 2. **Misalign naming and semantics**: A `Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid coercions via `NativeType::default_cast_for` becomes a contradiction in terms. > > I don't feel strongly about what signatures each individual UDF in this PR should implement, but I do feel strongly about maintaining the documented type coercion contracts and system semantics that define how DataFusion's type system is intended to work. The behavior of `TypeSignatureClass::Native` and `NativeType::default_cast_for` is explicitly documented, and their naming clearly signals their purpose. I think you totally point out the challenge we have for function's type coercion, the large difference between each function makes it hard to design a one size fit all solution but if we all in flexibility (with 'coerce_types' we ends up similar and repeated coercion logic everywhere. First of all, we need to have a out of box behavior for each function, and at the same time easy to be customizable. The Signature Coercible now becomes problematic, if we follows the system contract, it is not usable internally in datafusion core, but if we adapt it to function in datafusion core, it becomes less flexible and even easy to cause surprised regressions for other projects that rely on the old signature logic. The idea I have so far is that how about make Signature a method of trait at all, each function is able to implement it's own signature check, it not only is flexible by default and also any change to it is safely not to impact others. To make coercion less repeated and easy to use we provide many small and well defined utility functions like string coercion or numeric coercion both used internally and public to others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org