alamb opened a new issue, #14491: URL: https://github.com/apache/datafusion/issues/14491
### Is your feature request related to a problem or challenge? ## Introduction This ticket is my weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to https://github.com/apache/datafusion/issues/13970 Reminder, find new content (and please post some!) to [Concepts, Readings, Events](https://datafusion.apache.org/user-guide/concepts-readings-events.html#concepts-readings-events) page ## Community Highlights - The [Recording](https://www.youtube.com/watch?v=GruBeVDoWq4) and slides are available from the [2025 Jan 24 Amsterdam](https://www.meetup.com/data-drinks/events/304895184/): https://github.com/apache/datafusion/discussions/12988 - We are victims of our own success. At the time of writing there are over 50 PRs in various states of review [check out the list](https://github.com/apache/datafusion/pulls?page=2&q=is%3Apr+is%3Aopen+-review%3Aapproved+-is%3Adraft+-author%3Aalamb). More help reviewing the better ๐ - @comphead is working on a new frontpage: https://github.com/apache/datafusion/issues/14389 - Lessons from CMU, courtesy of @lmwnshn https://github.com/apache/datafusion/issues/14373 - [Papers we love NYC is reading the DataFusion paper this week](https://www.meetup.com/papers-we-love/events/305729353/?eventOrigin=group_upcoming_events) - @edmondop added link to the [a job board](https://datafusion.apache.org/contributor-guide/communication.html#job-board): https://github.com/apache/datafusion/pull/14191 - @jonbjo noted new user funner.io: https://github.com/apache/datafusion/pull/14316  # Releases! - [DataFusion 45](https://github.com/apache/datafusion/issues/14008) Release candidate is available. I have a great feeling about this one thanks to all the help testing from @shehabgamin @kevinjqliu @Omega359 and others - Arrow Minor release completed: https://github.com/apache/arrow-rs/issues/6929 (among other things has even faster parquet reading) # Performance DataFusion's core value proposition is great performance without having to re-implement it yourself - @pmcgleenon ran the numbers, and DataFusion 44 is ๐ถ on ClickBench: https://github.com/apache/datafusion/issues/13983 (45 is even better) - @XiangpengHao has a way to make parquet reading faster. We are looking for help testing. See https://github.com/apache/arrow-rs/pull/6921 - @UBarney made reverse faster: https://github.com/apache/datafusion/pull/14195 - @jatin510 implemented https://github.com/apache/datafusion/pull/14119 - @buraksenn @ozankabak and @berkaysynnada and https://github.com/apache/datafusion/pull/14279 some sweet sort based optimizations - @rluvaton made `array_agg` faster ๐ https://github.com/apache/datafusion/pull/14299 - @Rachelint improved median a lot: https://github.com/apache/datafusion/pull/13681 ๐ - And so did @2010YOUY01 https://github.com/apache/datafusion/pull/14399 - @pepijnve added https://github.com/apache/datafusion/pull/14276 # Quality ## Testing - @wiedld added Logical and Physical plan invariants: https://github.com/apache/datafusion/pull/13986 - @himadripal added https://github.com/apache/datafusion/pull/14284 - @logan-keede fixed `--complete`: https://github.com/apache/datafusion/pull/14254 - @buraksenn added https://github.com/apache/datafusion/pull/14307 - @duongcongtoai https://github.com/apache/datafusion/pull/14395 ## Bug Fixes DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people are now bashing them down - @xudong963 fixed several limit pushdown bugs: https://github.com/apache/datafusion/pull/14192 - @jatin510 https://github.com/apache/datafusion/pull/14236 - @waynexia https://github.com/apache/datafusion/pull/14126 - @dhegberg https://github.com/apache/datafusion/pull/14312 - @zhuqi-lucas https://github.com/apache/datafusion/pull/14338 - @zhuqi-lucas ๐ ๐จ https://github.com/apache/datafusion/pull/14245 - @findepi https://github.com/apache/datafusion/pull/14349 - @findepi https://github.com/apache/datafusion/pull/14356 - @jkosh44 https://github.com/apache/datafusion/pull/14289 - @cht42 fixed https://github.com/apache/datafusion/pull/14401 - @Omega359 and I fixed a bunch of coercion stuff: https://github.com/apache/datafusion/pull/14385, https://github.com/apache/datafusion/pull/14449, - @zhuqi-lucas (again!) https://github.com/apache/datafusion/pull/14418 ## Docs - @appletreeisyellow https://github.com/apache/datafusion/pull/14278 ## Build time - We are starting to look more seriously at build time: https://github.com/apache/datafusion/issues/14256 (thanks @waynexia) ## Cleanups ๐งน Now that we have a large useful codebase it is also important to keep it neat and tidy so we spend a non trivial time there too. - physical-optimizer into its own crate (finally!): thanks to @logan-keede @berkaysynnada and @buraksenn. See https://github.com/apache/datafusion/pull/14190 https://github.com/apache/datafusion/pull/14235, etc - @Chen-Yuan-Lai is on a tear using BooleanBufferBuilder instead of NullBufferBuilder: https://github.com/apache/datafusion/pull/14181 et al - @Kimahriman fixed null handling for `array_has`: https://github.com/apache/datafusion/pull/13683 - @logan-keede has been consolidating code: https://github.com/apache/datafusion/pull/14240 - @logan-keede also completed https://github.com/apache/datafusion/issues/10782 ๐ง # Features ## We can have nice things! (Error messages) - @eliaperantoni added support for source code locations for error https://github.com/apache/datafusion/pull/13664 and has organized a project to add more support https://github.com/apache/datafusion/issues/14429 - We started publishing the `datafusion-sqllogictest` crate to help testing in `icerberg-rust`: https://github.com/apache/datafusion/discussions/14229 (thanks to @liurenjie1024 for the great idea) - @jayzhan211 unified advanced UDF argument handling: https://github.com/apache/datafusion/pull/14094 - @gatesn added support for `SUM` statistics: https://github.com/apache/datafusion/pull/14074 - @erenavsarogullari added https://github.com/apache/datafusion/pull/14217 - @timsaucer made FFI hopefully more usable with asycn code: https://github.com/apache/datafusion/pull/13937 - @davisp made insert work in FFI: https://github.com/apache/datafusion/pull/14391 ## Coming soon: Extension Types ## Misc - @Spaarsh gave us ๐พ ๐ข `<=>` in https://github.com/apache/datafusion/pull/14187 # Looking to get more involved? Please help review code! ๐ฃ DataFusion has a long history of community members [contributing in all aspects of the project](https://datafusion.apache.org/contributor-guide/index.html). Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements. We have [docs about reviews](https://datafusion.apache.org/contributor-guide/index.html#reviewing-pull-requests). TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try `@` mentioning [one of the committers](https://projects.apache.org/committee.html?datafusion). ## Help wanted - I would love to see the community offer additional help testing, triaging bugs helping to make DataFusion a more stable foundation for building systems Please feel leave your own comments on this ticket if you are looking for help ## Community * [Weekly Call](https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit#heading=h.kpjkpncdmt1g) * Slack/Discord: [info links](https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord) ## Upcoming meetups: * Help schedule some! ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org