As we are still within the first three months of becoming a top level
project, we are submitting monthly reports to the ASF board.

To that end I have drafted a report ([1] and below) for June 2024 (based on
May 2024). Please feel free to add anything of note or suggest changes,
either by responding to this email or directly in the google doc. I plan to
submit it June 12, 2024

Thanks,
Andrew


[1]
https://docs.google.com/document/d/1h4yjvomQO0XdzxKuE4aBSWGNliFFmn8GADd8DlPuXBw/edit

-----------

2024-06-12 DataFusion ASF Board Report
https://github.com/apache/datafusion/issues/10155

DataFusion PMC Chair Note: Please add any relevant comments / content to
this document. I (Andrew Lamb) will submit to the ASF board on Wed June 12,
2024 (about one week prior to the scheduled board meeting).

The format of this report and the metrics are from
https://reporter.apache.org/wizard/?datafusion

The rationale and process for this report:
https://www.apache.org/foundation/board/reporting
Past examples: 2024-05-15 DataFusion ASF Board Report



## Description:
The mission of Apache DataFusion is the creation and maintenance of
software
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None

## Membership Data:
Apache DataFusion was founded 2024-04-16 (20 days ago)
There are currently 29 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:3.

Community changes, past month (our last report was May 2024):
- Mustafa Akur was added to the PMC on 2024-05-09
- Oleks V. was added to the PMC on 2024-05-09


## Project Activity:

The project continues to be quite active with many PRs and issues opened
and closed per day.

We have mostly completed tasks for becoming a new top level project. We are
in the process of making an ASF press release announcing the new top level
project with the Marketing and Publicity chairs, and plan to document more
thoroughly the process of inviting new committers and PMC members.


### DataFusion core
https://github.com/apache/datafusion

We made our first successful release as a new project, version 38.0.0

In addition to the work related to moving to a top-level project, the
community continues to work on making logical planning faster, making
function
packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing”
logical
plan expressions back to SQL, and improve type coercion.

Recently there has been renewed interest in reading parquet files and
creating secondary indexes.

For the DataFusion repo since 2024-05-15, as of 2024-05-07: TODO UPDATE
THESE NUMBERS

132 commits[1] 46 code contributors[2] 168 PRs opened on GitHub[3] 187 PRs
closed on GitHub[4] 130 issues opened on GitHub[5] 94 issues closed on
GitHub[6]


[1]: git log --since="2024-04-16" --pretty=format:"%h" | wc -l
[2]: git shortlog -sn --since="2024-04-16" | wc -l
[3]: https://s.apache.org/x5gkj
[4]: https://s.apache.org/rg9op
[5]: https://s.apache.org/sqlun
[6]: https://s.apache.org/l3clf


### Sub project: DataFusion Python
https://github.com/apache/datafusion-python

The DataFusion Python subproject is not currently actively maintained; We
are in the process of releasing version 38.0.0 with help of the community


### Sub project: DataFusion Comet
https://github.com/apache/datafusion-comet

The Comet subproject has had face to face sync meetings which are
recorded[1].

[1] https://lists.apache.org/thread/9kqxkpwxf4oxonfboyfh8j6ko7r3fb3z

The Comet subproject is very active and is receiving significant
contributions
from new contributors. There is some initial documentation published at
https://datafusion.apache.org/comet/.


### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python

The Ballista subproject is not currently actively maintained.

### Recent Releases

* PYTHON-37.1.0 was released on 2024-05-13.
* 38.0.0 was released on 2024-05-10.


## Community Health:
We have added several new committers and PMC members (see above) in the
last month, and we expect to continue to do so regularly. While it would
always be nice to have more bandwidth to devote to PMC activities, we are
currently doing well.

While most communications still happen through github, the mailing lists
are now fully active, as reflected in their metrics:

dev@datafusion.apache.org had a big increase in traffic in the past quarter
(50 emails compared to 0)
git...@datafusion.apache.org had a big increase in traffic in the past
quarter (4787 emails compared to 0)

Reply via email to