alamb opened a new issue, #13760: URL: https://github.com/apache/datafusion/issues/13760
## Introduction This ticket is a weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to https://github.com/apache/datafusion/issues/13630 Loosely inspired by https://this-week-in-rust.org/ Reminder, find new content (and please post some!) to - [Concepts, Readings, Events](https://datafusion.apache.org/user-guide/concepts-readings-events.html#concepts-readings-events) page ## Community Highlights - @Rachelint became a committer! ## Theme: DataFusion Fever is Spreading I think DataFusion is reaching an inflection point: It is now good enough that more than early adopters can and are building real, production systems using DataFusion. This is a great milestone ๐ and I think the project is adjusting to this new reality. One major theme we have been discussing in the last week or two is making upgrades easier. The recent pushes in `43.0.0` and `44.0.0` to clean up / complete projects such as StringView, window function migration, improved APIs, etc have caused significant downstream complications upgrading. Going forward as a community, we are discussing ways to improve this process. I hope to write more on this topic You can read more about this here - https://github.com/apache/datafusion/issues/13648 - https://github.com/apache/datafusion/issues/13525 ## New Blog Website - @timsaucer has reworked the blog https://datafusion.apache.org/blog/ so that it now autopublishes when changes are made to https://github.com/apache/datafusion-site ๐ ## Releases - [x] DataFusion python 43.0.0 was released: https://crates.io/crates/datafusion-python/43.1.0 ๐ - [ ] sqlparser release is in voting phase: https://github.com/apache/datafusion-sqlparser-rs/issues/1517 - [ ] arrow-rs major release https://github.com/apache/arrow-rs/issues/6342 - [ ] We are discussing DataFusion 44.0.0: https://github.com/apache/datafusion/issues/13334 ## Performance The community loves a good benchmark challenge. We are off to a great start making h20 benchmark even faster, see - https://github.com/apache/datafusion/issues/13548 - @2010YOUY01 / @Dandandan and @jayzhan211 made `corr` more than 3x faster ๐ - @Rachellint is also working on `median` doing https://github.com/apache/datafusion/issues/13550 I also also made a change with example to allow array reuse in functions, which adds to the ๐ ๐จ ๐งฐ - https://github.com/apache/datafusion/pull/13637 Thanks @dhegberg for a CSV loading benchmark https://github.com/apache/datafusion/pull/13544 Also thanks to @richox, @zhangli20, @tlm365 @jayzhan211 @Weijun-H, @comphead and @Dandandan for improving the speed of other functions - https://github.com/apache/datafusion/pull/13691 - https://github.com/apache/datafusion/pull/13696 - https://github.com/apache/datafusion/pull/13688 - https://github.com/apache/datafusion/pull/13675 ## ๐ fixes, and improvements @onursatici has been on a tear along with @korowa @haohuaijin - https://github.com/apache/datafusion/pull/13709 - https://github.com/apache/datafusion/pull/13677 - https://github.com/apache/datafusion/pull/13560 - Thanks to @Eason0729 for https://github.com/apache/datafusion/pull/12939 @findepi @jonahgao @comphead and others have been cleaning up the code ๐งน - https://github.com/apache/datafusion/pull/13730 - https://github.com/apache/datafusion/pull/13628 - https://github.com/apache/datafusion/pull/13641 - https://github.com/apache/datafusion/pull/13685 - https://github.com/apache/datafusion/pull/13712 - https://github.com/apache/datafusion/pull/13728 ## Unparser We have been cranking away filling out plan --> SQL feature, thanks to @goldmedal - https://github.com/apache/datafusion/pull/13660 - https://github.com/apache/datafusion/pull/13599 ## Hashbrown @crepererum has been working to migrate our use of hashbrown to higher level APIs - https://github.com/apache/datafusion/issues/13433 - https://github.com/apache/datafusion/pull/13658 ## Looking to get more involved? Try code review! (can you see what I did there ๐ฃ ) DataFusion has a long history of community members [contributing in all aspects of the project](https://datafusion.apache.org/contributor-guide/index.html). Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements. We have [docs about reviews](https://datafusion.apache.org/contributor-guide/index.html#reviewing-pull-requests). TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try `@` mentioning [one of the committers](https://projects.apache.org/committee.html?datafusion). ## Help wanted - I would love to see the community offer additional help testing, triaging bugs helping to make DataFusion a more stable foundation for building systems Please feel leave your own comments on this ticket if you are looking for help ## Community * [Weekly Call](https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit#heading=h.kpjkpncdmt1g) * Slack/Discord: [info links](https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord) ## Upcoming meetups: * [2024 Dec 18 Chicago](https://github.com/apache/datafusion/discussions/12894): https://lu.ma/eq5myc5i @adriangb @timsaucer * https://github.com/apache/datafusion/discussions/12988 * [2025 Jan 15 Boston](https://github.com/apache/datafusion/discussions/13165) * [2025 Jan 24 Amsterdam](https://www.meetup.com/data-drinks/events/304895184/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org