shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626079378

   > > @jayzhan211 and @shehabgamin  what is the status of this PR? It looks to 
me like there are some unresolved comments
   > > It looks like there are some unresolved comments like [#14268 
(comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432)
 from @jayzhan211 that suggest this PR is being too aggressive in coercion
   > > I would like to try and get this in for the fhe DataFusion 45 release so 
it would be nice if we can figure out how to fix the regressions
   > 
   > There are concerns unresolved
   
   @alamb and @jayzhan211 - Thank you for your patience and thoughtful 
discussion. Sorry for being quiet here, I’ve spent time reflecting on how to 
reconcile the competing priorities over the past couple of days.
   Let me clarify the core principles driving this PR.
   
   This PR aims to restore DataFusion's documented type coercion contracts - 
specifically the semantics defined for 
`TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging 
the existing casting logic in `NativeType::default_cast_for`.
   
   The current implementation strictly follows the guidance in 
https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134,
 where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type 
castable to the target `NativeType` through the explicit set of type conversion 
rules defined in `NativeType::default_cast_for`.
   
   I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB 
semantics for the various UDFs in this PR. However, there’s a critical 
distinction: **System contracts vs. SQL dialect behavior.**
   
   While PostgreSQL/DuckDB-style strictness is desirable for specific use 
cases, subverting the type system’s foundational contracts would result in:
   
   1. **Lost flexibility:** Users lose the ability to intentionally design UDFs 
with broader NativeType coercion patterns when needed. The system becomes 
opinionated in ways that aren't transparent.
   2. **Hidden couplings:** Tightly linking coercion rules to individual 
function implementations makes behavior unpredictable across optimizer rules. 
What works for `ascii()` might break `concat()` unexpectedly.
   3. **Eroded system integrity:** Undocumented exceptions to 
`NativeType::default_cast_for` in 
`Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion 
context-dependent and impossible to maintain and reason about holistically.
   
   The current approach keeps the door open for both strict and permissive use 
cases through explicit signature choices. With these changes, users can still 
achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures 
while still leveraging broader coercion where appropriate. Conversely, 
hardcoding dialect-specific restrictions at the type system level would limit 
DataFusion’s flexibility and usability as a polyglot execution engine.
   
   If we bypass these contracts to enforce function-specific rules then we:
   
   1. **Break documented behavior:** The system no longer behaves as its APIs 
and documentation promise, creating confusion.
   2. **Misalign naming and semantics**: A 
`Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid 
coercions via `NativeType::default_cast_for` becomes a contradiction in terms.
   
   I don't feel strongly about what signatures each individual UDF in this PR 
should implement, but I do feel strongly about maintaining the documented type 
coercion contracts and system semantics that define how DataFusion's type 
system is intended to work. The behavior of `TypeSignatureClass::Native` and 
`NativeType::default_cast_for` is explicitly documented, and their naming 
clearly signals their purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to