rafafrdz commented on code in PR #17485:
URL: https://github.com/apache/datafusion/pull/17485#discussion_r2341172796


##########
datafusion/spark/src/function/url/parse_url.rs:
##########
@@ -47,23 +46,7 @@ impl Default for ParseUrl {
 impl ParseUrl {
     pub fn new() -> Self {
         Self {
-            signature: Signature::one_of(
-                vec![
-                    TypeSignature::Uniform(
-                        1,
-                        vec![DataType::Utf8View, DataType::Utf8, 
DataType::LargeUtf8],
-                    ),
-                    TypeSignature::Uniform(
-                        2,
-                        vec![DataType::Utf8View, DataType::Utf8, 
DataType::LargeUtf8],
-                    ),
-                    TypeSignature::Uniform(
-                        3,
-                        vec![DataType::Utf8View, DataType::Utf8, 
DataType::LargeUtf8],
-                    ),
-                ],
-                Volatility::Immutable,
-            ),
+            signature: Signature::user_defined(Volatility::Immutable),

Review Comment:
   After rereading this several times, my understanding is that when you pass a 
dictionary array whose values are strings, DataFusion attempts to match it 
against the `String` signature. However, `parse_url` is defined to accept only 
**plain string** arguments 
[ref](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.parse_url.html).
 It does not expect any dictionary inputs.
   
   We mark the UDF’s signature as `user_defined` to enable coercion across 
string types (`Utf8`, `Utf8View`, `LargeUtf8`), but a dictionary array is still 
not a string type, so it isn’t coerced, and the call won’t match.
   
   In short, even if the `String` signature seems to "capture" dictionaries 
with string values, `parse_url` will still reject them because the underlying 
physical type is a dictionary, not a string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to