Re: [PR] feat: supports array_distinct [datafusion-comet]

via GitHub Wed, 25 Jun 2025 06:32:12 -0700


andygrove commented on PR #1923:
URL: 
https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3004790090


   > > ...1 more thing please add tests with empty array.
   > 
   > I tested array_distinct with an empty array.
   > 
   > ```
   > SELECT array_distinct(array()) FROM t1;
   > 
   > == Optimized Logical Plan ==
   > Project [[] AS array_distinct(array())#240]
   > +- Relation 
[_1#121,_2#122,_3#123,_4#124,_5#125L,_6#126,_7#127,_8#128,_9#129,_10#130,_11#131L,_12#132,_13#133,_14#134,_15#135,_16#136,_17#137,_18#138,_19#139,_20#140,_21#141,_id#142]
 parquet
   > 
   > == Physical Plan ==
   > *(1) Project [[] AS array_distinct(array())#240]
   > +- *(1) CometColumnarToRow
   >    +- CometScan parquet [] Batched: true, DataFilters: [], Format: 
CometParquet, Location: InMemoryFileIndex(1 
paths)[file:/private/var/folders/2r/znvj4hhd3t1cp22pmw4m3h_40000gn/T/spark-f9...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
   > ```
   > 
   > Spark uses an alias, `[] AS array_distinct(array())` , so it doesn't reach 
`case _: ArrayDistinct => convert(CometArrayDistinct`
   
   In this case, Spark is replacing the `array_distinct` expression with a 
literal at planning time. To test with an empty array you would need to force 
this to happen at query execution time. You can do this using a `CASE WHEN` 
expression, similar to other tests in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: supports array_distinct [datafusion-comet]

Reply via email to