Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

via GitHub Mon, 21 Jul 2025 23:46:49 -0700


rishvin commented on issue #1941:
URL: 
https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3101303544


   Some Updates:
   I have a simple test to start with, which will produce `_groupingmapsort`.
   ```
   val data = Seq(
        |   Map("a" -> 1, "b" -> 2),
        |   Map("a" -> 3, "b" -> 4),
        |   Map("b" -> 2, "a" -> 1)
        | )
   
   val df = data.toDF("map")
   df.groupBy("map").count().show(false)
   ```
   
   So far based on my understanding, it looks like - this will require some 
plumbing at the Parquet reader utils in Scala, because currently we only 
support primitive types but `MapType` is a complex type. I made some hack to 
bypass some type-checking, however, the `Native.initColumnReader()` expects 
`primitiveTypeId`. I have to understand how the `MapType` would translate to 
`primitiveTypeId`. The closest one at the moment seems to be parquet's `BINARY` 
physical type. So, thinking of passing `BINARY` type and annotating with 
`MapKeyValueTypeAnnotation`. However, I'm still trying to understand this piece 
of code, so I might be wrong. I will do some more experiments to have more 
clarity on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

Reply via email to