jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626205413

   > > > @jayzhan211 and @shehabgamin  what is the status of this PR? It looks 
to me like there are some unresolved comments
   > > > It looks like there are some unresolved comments like [#14268 
(comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432)
 from @jayzhan211 that suggest this PR is being too aggressive in coercion
   > > > I would like to try and get this in for the fhe DataFusion 45 release 
so it would be nice if we can figure out how to fix the regressions
   > > 
   > > There are concerns unresolved
   > 
   > @alamb and @jayzhan211 - Thank you for your patience and thoughtful 
discussion. Sorry for being quiet here, I’ve spent time reflecting on how to 
reconcile the competing priorities over the past couple of days.
   > 
   > Let me clarify the core principles driving this PR.
   > 
   > This PR aims to restore DataFusion's documented type coercion contracts - 
specifically the semantics defined for 
`TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging 
the existing casting logic in `NativeType::default_cast_for`.
   > 
   > The current implementation strictly follows the guidance in: 
https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134
 where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type 
castable to the target `NativeType` through the explicit set of type conversion 
rules defined in `NativeType::default_cast_for`.
   > 
   > I fully recognize this creates behavior that diverges from 
PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, there’s a 
critical distinction: **System contracts vs. SQL dialect behavior.**
   > 
   > While PostgreSQL/DuckDB-style strictness is desirable for specific use 
cases, subverting the type system’s foundational contracts would result in:
   > 
   > 1. **Lost flexibility:** Users lose the ability to intentionally design 
UDFs with broader NativeType coercion patterns when needed. The system becomes 
opinionated in ways that aren't transparent.
   > 2. **Hidden couplings:** Tightly linking coercion rules to individual 
function implementations makes behavior unpredictable across optimizer rules. 
What works for `ascii()` might break `concat()` unexpectedly.
   > 3. **Eroded system integrity:** Undocumented exceptions to 
`NativeType::default_cast_for` in 
`Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion 
context-dependent and impossible to maintain and reason about holistically.
   > 
   > The current approach keeps the door open for both strict and permissive 
use cases through explicit signature choices. With these changes, users can 
still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing 
signatures while still leveraging broader coercion where appropriate. 
Conversely, hardcoding dialect-specific restrictions at the type system level 
would limit DataFusion’s flexibility and usability as a polyglot execution 
engine.
   > 
   > If we bypass these contracts to enforce function-specific rules then we:
   > 
   > 1. **Break documented behavior:** The system no longer behaves as its APIs 
and documentation promise, creating confusion.
   > 2. **Misalign naming and semantics**: A 
`Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid 
coercions via `NativeType::default_cast_for` becomes a contradiction in terms.
   > 
   > I don't feel strongly about what signatures each individual UDF in this PR 
should implement, but I do feel strongly about maintaining the documented type 
coercion contracts and system semantics that define how DataFusion's type 
system is intended to work. The behavior of `TypeSignatureClass::Native` and 
`NativeType::default_cast_for` is explicitly documented, and their naming 
clearly signals their purpose.
   
   I think you totally point out the challenge we have for function's type 
coercion, the large difference between each function makes it hard to design a 
one size fit all solution but if we all in flexibility (with 'coerce_types' we 
ends up similar and repeated coercion logic everywhere.
   
   First of all, we need to have a out of box behavior for each function, and 
at the same time easy to be customizable. The Signature Coercible now becomes 
problematic, if we follows the system contract, it is not usable internally in 
datafusion core, but if we adapt it to function in datafusion core, it becomes 
less flexible and even easy to cause surprised regressions for other projects 
that rely on the old signature logic.
   
   The idea I have so far is that how about make Signature a method of trait at 
all, each function is able to implement it's own signature check, it not only 
is flexible by default and also any change to it is safely not to impact 
others. To make coercion less repeated and easy to use we provide many small 
and well defined utility functions like string coercion or numeric coercion 
both used internally and public to others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to