alamb commented on issue #10157: URL: https://github.com/apache/datafusion/issues/10157#issuecomment-2535689277
Here is the final report that I submitted. Thanks to @phillipleblanc @andygrove @timsaucer @milenkovicm for the help writing it 🙏 ``` ## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (8 months ago) There are currently 42 committers and 14 PMC members in this project. The Committer-to-PMC ratio is 3:1. Community changes, past quarter: - No new PMC members. Last addition was Jay Zhan on 2024-08-11. - Piotr Findeisen was added as committer on 2024-12-03 - Jax Liu was added as committer on 2024-10-18 - Ifeanyi Ubah was added as committer on 2024-11-04 - Ruiqiu Cao was added as committer on 2024-12-10 - Michael Ward was added as committer on 2024-09-13 ## Project Activity: ### Overall We have completed adopting [sqlparser crate] into the project and made our first release as part of the Apache Software Foundation. [sqlparser crate]: https://github.com/apache/datafusion-sqlparser-rs ### DataFusion core https://github.com/apache/datafusion We continue the monthly release cadence versions. The [42.0.0 release] and [43.0.0 release] had 73 and 96 unique contributors. We continue to [discuss the roadmap] in the open, and gathered a collection of [DataFusion related articles] onto our page. We recently finished [significant performance improvements] as well as long standing projects to migrate documentation to code and use the same API for all user defined window functions. We also added FFI bindings to make it easier to use multiple versions of DataFusion. As more people build systems using DataFusion we are beginning to focus more on keeping the core more stable, as it is [sometimes painful] to update to new DataFusion versions. [42.0.0 release]: https://github.com/apache/datafusion/ blob/main/dev/changelog/42.0.0.md [43.0.0 release]: https://github.com/apache/datafusion/ blob/main/dev/changelog/42.0.0.md [roadmap ticket]: https://github.com/apache/datafusion/issues/11442 [discuss the roadmap]: https://github.com/apache/datafusion/issues/13274 [DataFusion related articles]: https://datafusion.apache.org/ user-guide/concepts-readings-events.html [significant performance improvements]: https://datafusion.apache.org/blog/ 2024/11/18/datafusion-fastest-single-node-parquet-clickbench/ [sometimes painful]: https://github.com/apache/datafusion/issues/13525 ### Sub project: DataFusion Python https://github.com/apache/datafusion-python We continue the monthly release cadence versions. The [datafusion-python 41.0.0] release and [datafusion-python 42.0.0] had 5 and 6 unique contributors. Release for version 43.0.0 is underway at the time of this writing. We recently added support for [user defined window functions], including significant updates to the user documentation on how to author user defined functions. Additionally we released a [blog post on UDFs] demonstrating how users can incorporate custom UDFs that can lead to 10x speed improvements by writing Rust backed Python functions. We added support for foreign table providers via the FFI bindings in the core project. This enables external parties to provide Python interfaced table providers that support features such as push down filtering, including across different versions of DataFusion. [datafusion-python 41.0.0]: https://github.com/apache/datafusion-python /pull/866 [datafusion-python 42.0.0]: https://github.com/apache/datafusion-python /pull/901 [blog post on UDFs]: https://datafusion.apache.org/blog /2024/11/19/datafusion-python-udf-comparisons/ ### Sub project: DataFusion Comet https://github.com/apache/datafusion-comet The Comet project recently released version 0.4.0 with a focus on performance & stability. See [Blog post] [Blog post]: https://datafusion.apache.org/blog/ 2024/11/20/datafusion-comet-0.4.0/ Much of the current development focus is on improving complex type support, particularly the ability to read complex types from Parquet and Iceberg sources. ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista Since the last board report, the Ballista subproject has become much more active and added new active maintainers. The focus has changed from "Apache DataFusion Ballista Distributed Query Engine" to "Making Apache DataFusion Applications Distributed" The community has simplified the project by removing unfinished features and refocusing as a way to scale out existing DataFusion applications by providing a tighter integration with the core DataFusion project. See more [details here] [details here]: https://github.com/apache/datafusion/ issues/10157#issuecomment-2514694231 ### Sub project: Sqlparser https://github.com/apache/datafusion-sqlparser-rs The sqlparser project became part of the DataFusion project this quarter. In addition to ongoing additions to SQL dialect support, we made our first release as part of the Apache DataFusion project, and have started introducing spans (source locations), a long requested feature. ## Community Health: It is still hard to keep track of everything going on, which is a good thing. While it is always a struggle to get enough code review capacity, we have many active committers, and the community in general helps each other out with reviews. We continue to actively grow our committer and PMC ranks. We have upcoming in person meetups scheduled for Chicago, Boston, and Amsterdam. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
