As we are still within the first three months of becoming a top level project, we are submitting monthly reports to the ASF board.
To that end I have drafted a report ([1] and below) for June 2024 (based on May 2024). Please feel free to add anything of note or suggest changes, either by responding to this email or directly in the google doc. I plan to submit it June 12, 2024 Thanks, Andrew [1] https://docs.google.com/document/d/1h4yjvomQO0XdzxKuE4aBSWGNliFFmn8GADd8DlPuXBw/edit ----------- 2024-06-12 DataFusion ASF Board Report https://github.com/apache/datafusion/issues/10155 DataFusion PMC Chair Note: Please add any relevant comments / content to this document. I (Andrew Lamb) will submit to the ASF board on Wed June 12, 2024 (about one week prior to the scheduled board meeting). The format of this report and the metrics are from https://reporter.apache.org/wizard/?datafusion The rationale and process for this report: https://www.apache.org/foundation/board/reporting Past examples: 2024-05-15 DataFusion ASF Board Report ## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (20 days ago) There are currently 29 committers and 9 PMC members in this project. The Committer-to-PMC ratio is roughly 8:3. Community changes, past month (our last report was May 2024): - Mustafa Akur was added to the PMC on 2024-05-09 - Oleks V. was added to the PMC on 2024-05-09 ## Project Activity: The project continues to be quite active with many PRs and issues opened and closed per day. We have mostly completed tasks for becoming a new top level project. We are in the process of making an ASF press release announcing the new top level project with the Marketing and Publicity chairs, and plan to document more thoroughly the process of inviting new committers and PMC members. ### DataFusion core https://github.com/apache/datafusion We made our first successful release as a new project, version 38.0.0 In addition to the work related to moving to a top-level project, the community continues to work on making logical planning faster, making function packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical plan expressions back to SQL, and improve type coercion. Recently there has been renewed interest in reading parquet files and creating secondary indexes. For the DataFusion repo since 2024-05-15, as of 2024-05-07: TODO UPDATE THESE NUMBERS 132 commits[1] 46 code contributors[2] 168 PRs opened on GitHub[3] 187 PRs closed on GitHub[4] 130 issues opened on GitHub[5] 94 issues closed on GitHub[6] [1]: git log --since="2024-04-16" --pretty=format:"%h" | wc -l [2]: git shortlog -sn --since="2024-04-16" | wc -l [3]: https://s.apache.org/x5gkj [4]: https://s.apache.org/rg9op [5]: https://s.apache.org/sqlun [6]: https://s.apache.org/l3clf ### Sub project: DataFusion Python https://github.com/apache/datafusion-python The DataFusion Python subproject is not currently actively maintained; We are in the process of releasing version 38.0.0 with help of the community ### Sub project: DataFusion Comet https://github.com/apache/datafusion-comet The Comet subproject has had face to face sync meetings which are recorded[1]. [1] https://lists.apache.org/thread/9kqxkpwxf4oxonfboyfh8j6ko7r3fb3z The Comet subproject is very active and is receiving significant contributions from new contributors. There is some initial documentation published at https://datafusion.apache.org/comet/. ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista https://github.com/apache/datafusion-ballista-python The Ballista subproject is not currently actively maintained. ### Recent Releases * PYTHON-37.1.0 was released on 2024-05-13. * 38.0.0 was released on 2024-05-10. ## Community Health: We have added several new committers and PMC members (see above) in the last month, and we expect to continue to do so regularly. While it would always be nice to have more bandwidth to devote to PMC activities, we are currently doing well. While most communications still happen through github, the mailing lists are now fully active, as reflected in their metrics: dev@datafusion.apache.org had a big increase in traffic in the past quarter (50 emails compared to 0) git...@datafusion.apache.org had a big increase in traffic in the past quarter (4787 emails compared to 0)