andygrove commented on code in PR #78: URL: https://github.com/apache/datafusion-site/pull/78#discussion_r2185674170
########## content/blog/2025-07-01-datafusion-comet-0.9.0.md: ########## @@ -0,0 +1,176 @@ +--- +layout: post +title: Apache DataFusion Comet 0.9.0 Release +date: 2025-07-01 +author: pmc +categories: [subprojects] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +The Apache DataFusion PMC is pleased to announce version 0.9.0 of the [Comet](https://datafusion.apache.org/comet/) subproject. + +Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes. + +This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 +contributors. See the [change log] for more information. + +[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md + +## Release Highlights + +### Complex Type Support in Parquet Scans + +Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not +yet available when reading Parquet files from Apache Iceberg. + +This functionality was only available in previous releases when manually specifying one of the new experimental +scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no +longer requires manual configuration. + +### Complex Type Processing Improvements + +Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between +structs and accessing fields within deeply nested types. + +### Shuffle Improvements + +Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In +previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have +now been fixed. + +### New Features + +Comet 0.9.0 adds support for the following Spark expressions: + +- ArrayDistinct +- ArrayMax +- ArrayRepeat +- ArrayUnion +- BitCount +- BitNot +- Expm1 +- MapValues +- Signum +- ToPrettyString +- map[] + +### Improved Spark SQL Test Coverage + +Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against Review Comment: I have rewritten this section to clarify that the remaining 3% of tests are not ignored due to compatibility issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org