andygrove commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-3004790090
> > ...1 more thing please add tests with empty array. > > I tested array_distinct with an empty array. > > ``` > SELECT array_distinct(array()) FROM t1; > > == Optimized Logical Plan == > Project [[] AS array_distinct(array())#240] > +- Relation [_1#121,_2#122,_3#123,_4#124,_5#125L,_6#126,_7#127,_8#128,_9#129,_10#130,_11#131L,_12#132,_13#133,_14#134,_15#135,_16#136,_17#137,_18#138,_19#139,_20#140,_21#141,_id#142] parquet > > == Physical Plan == > *(1) Project [[] AS array_distinct(array())#240] > +- *(1) CometColumnarToRow > +- CometScan parquet [] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/2r/znvj4hhd3t1cp22pmw4m3h_40000gn/T/spark-f9..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<> > ``` > > Spark uses an alias, `[] AS array_distinct(array())` , so it doesn't reach `case _: ArrayDistinct => convert(CometArrayDistinct` In this case, Spark is replacing the `array_distinct` expression with a literal at planning time. To test with an empty array you would need to force this to happen at query execution time. You can do this using a `CASE WHEN` expression, similar to other tests in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org