Oleks V. created SPARK-55749:
--------------------------------
Summary: array_contains built in function result are inconsistent
with other engines
Key: SPARK-55749
URL: https://issues.apache.org/jira/browse/SPARK-55749
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.4.3
Reporter: Oleks V.
Hi community
I would love to get help on `array_contains` builtin function results when NULL
is one of the values in the array. The function returns NULL which is opposite
to other engines like DuckDB/Postresql.
Example:
{code:java}
scala> spark.sql("select array_contains(array(1, null, 3), 2)").show(false)
+------------------------------------+
|array_contains(array(1, NULL, 3), 2)|
+------------------------------------+
|null |
+------------------------------------+
{code}
For instance DuckDB returns false
{code:java}
D select array_has([1, null, 3], 2);
┌───────────────────────────────────────────┐
│ array_has(main.list_value(1, NULL, 3), 2) │
│ boolean │
├───────────────────────────────────────────┤
│ false │
└───────────────────────────────────────────┘
D {code}
Postgres doesn't have *array_contains* in Spark/DuckDB understanding but the
closest analogue would be
{code:java}
select array[1, null, 3] @> array[2]
false {code}
For this ticket I'd like to understand if this behavior for Apache Spark is
correct, because the documentation for the function does not mention such
scenario explicitly
{code:java}
/**
* Checks if the array (left) has the element (right)
*/
@ExpressionDescription(
usage = "_FUNC_(array, value) - Returns true if the array contains the
value.",
examples = """
Examples:
> SELECT _FUNC_(array(1, 2, 3), 2);
true
""",
group = "array_funcs",
since = "1.5.0")
case class ArrayContains(left: Expression, right: Expression) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]