shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626079378
> > @jayzhan211 and @shehabgamin what is the status of this PR? It looks to me like there are some unresolved comments > > It looks like there are some unresolved comments like [#14268 (comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432) from @jayzhan211 that suggest this PR is being too aggressive in coercion > > I would like to try and get this in for the fhe DataFusion 45 release so it would be nice if we can figure out how to fix the regressions > > There are concerns unresolved @alamb and @jayzhan211 - Thank you for your patience and thoughtful discussion. Sorry for being quiet here, I’ve spent time reflecting on how to reconcile the competing priorities over the past couple of days. Let me clarify the core principles driving this PR. This PR aims to restore DataFusion's documented type coercion contracts - specifically the semantics defined for `TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging the existing casting logic in `NativeType::default_cast_for`. The current implementation strictly follows the guidance in https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134, where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type castable to the target `NativeType` through the explicit set of type conversion rules defined in `NativeType::default_cast_for`. I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, there’s a critical distinction: **System contracts vs. SQL dialect behavior.** While PostgreSQL/DuckDB-style strictness is desirable for specific use cases, subverting the type system’s foundational contracts would result in: 1. **Lost flexibility:** Users lose the ability to intentionally design UDFs with broader NativeType coercion patterns when needed. The system becomes opinionated in ways that aren't transparent. 2. **Hidden couplings:** Tightly linking coercion rules to individual function implementations makes behavior unpredictable across optimizer rules. What works for `ascii()` might break `concat()` unexpectedly. 3. **Eroded system integrity:** Undocumented exceptions to `NativeType::default_cast_for` in `Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion context-dependent and impossible to maintain and reason about holistically. The current approach keeps the door open for both strict and permissive use cases through explicit signature choices. With these changes, users can still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures while still leveraging broader coercion where appropriate. Conversely, hardcoding dialect-specific restrictions at the type system level would limit DataFusion’s flexibility and usability as a polyglot execution engine. If we bypass these contracts to enforce function-specific rules then we: 1. **Break documented behavior:** The system no longer behaves as its APIs and documentation promise, creating confusion. 2. **Misalign naming and semantics**: A `Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid coercions via `NativeType::default_cast_for` becomes a contradiction in terms. I don't feel strongly about what signatures each individual UDF in this PR should implement, but I do feel strongly about maintaining the documented type coercion contracts and system semantics that define how DataFusion's type system is intended to work. The behavior of `TypeSignatureClass::Native` and `NativeType::default_cast_for` is explicitly documented, and their naming clearly signals their purpose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org