Kimahriman commented on code in PR #731:
URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693649707
##########
spark/src/main/scala/org/apache/spark/sql/comet/CometRowToColumnarExec.scala:
##########
@@ -60,8 +62,17 @@ case class CometRowToColumnarExec(child: SparkPlan)
val timeZoneId = conf.sessionLocalTimeZone
val schema = child.schema
- child
- .execute()
+ val rdd: RDD[InternalRow] = if (child.supportsColumnar) {
+ child
+ .executeColumnar()
+ .mapPartitionsInternal { iter =>
+ iter.flatMap(_.rowIterator().asScala)
+ }
+ } else {
+ child.execute()
+ }
+
+ rdd
Review Comment:
There might be a more efficient way than using a row iterator to write to
the row based arrow writer, but as this is mostly for testing/fallback
purposes, I didn't try to figure out a faster Spark vector to arrow vector
approach. Hopefully someone is able to add complex type support to the Comet
parquet reader. If not it could be worth thinking about using the Spark reader
as a real fallback use case instead of just for testing purposes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]