alamb opened a new issue, #13167: URL: https://github.com/apache/datafusion/issues/13167
## Introduction This ticket is a weekly summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community Loosely inspired by https://this-week-in-rust.org/ ## DataFusion Related Blogs - [Caching in DataFusion: Don't read twice](https://blog.haoxp.xyz/posts/caching-datafusion) from @XiangpengHao - [Parquet pruning in DataFusion: Read no more than you need](https://blog.haoxp.xyz/posts/parquet-to-arrow/) from @XiangpengHao ## Upcoming Releases - [ ] https://github.com/apache/datafusion/issues/13166 -- trying to unblock delta-rs upgrade - [ ] https://github.com/apache/datafusion/issues/12470 (thanks @andygrove) - [ ] https://github.com/apache/datafusion-sqlparser-rs/issues/1423 (huge kudos to @iffyio for all the reviews) ## Major Projects / Discussions under way - https://github.com/apache/datafusion/issues/12821 -- show the world what you can do with focused engineering effort. Thanks to the epic work of @Rachelint, @goldmedal, @jayzhan211, @Dandandan @XiangpengHao and others, - https://github.com/apache/arrow-rs/issues/5523 - @XiangpengHao and @tustvold are working to make parquet *even better* - https://github.com/apache/datafusion/issues/12357 - @timsaucer is working on https://github.com/apache/datafusion/pull/12920, bindings to DataFusion (stable C API). Making good progress with various [PRs](https://github.com/apache/datafusion/pull/13136) - Helping make DataFusion more visible: https://github.com/apache/datafusion/discussions/13049 @SamSynnada - @2010YOUY01 started working on https://github.com/apache/datafusion/issues/13123 - @eejbyfeldt has been bashing away at bugs / things that prevent complete TPC-DS run such as https://github.com/apache/datafusion/pull/13091 ## Highlights from last week(s): (I am sorry if I missed you -- please add a note to this ticket with anything you would like to highlight) - 🎉 Rust ORC implementation is "graduating" from `datafusion-contriub`: https://github.com/datafusion-contrib/datafusion-orc/issues/120. Thanks @waynexia and @Xuanwo - PR is up to improve predicate pushdown into parquet, https://github.com/apache/datafusion/pull/12978 from @adirangb - @Blizzara, @LatrecheYasser @vbarua @westonpace and @tokoko keep hardening the substrait implementation with PRs such as [this](https://github.com/apache/datafusion/pull/12112) and [this](https://github.com/apache/datafusion/pull/13112) - @goldmedal and @sgrebnov are making the Plan --> SQL text unparser cover even more SQL https://github.com/apache/datafusion/pull/13132 - @comphead is methodically making Sort-Merge-Join production ready (eg. https://github.com/apache/datafusion/pull/13111) - @Omega359 and @mnorfolk03 are getting our SQL planner benchmarks in good shape: https://github.com/apache/datafusion/pull/13103 https://github.com/apache/datafusion/pull/13085 - @2010YOUY01 began working on improving memory limited aggregation: https://github.com/apache/datafusion/pull/13090 - @LeslieKid improved the aggregation test coverage https://github.com/apache/datafusion/pull/13041 - March towards all user defined UDWF: https://github.com/apache/datafusion/pull/13040 from @jatin510 - @buraksenn submitted several improvements ❤ : https://github.com/apache/datafusion/pull/13095 / https://github.com/apache/datafusion/pull/13076 / https://github.com/apache/datafusion/pull/13034 ## Looking to get more involved? Try code review! DataFusion has a long history of community members [contributing in all aspects of the project](https://datafusion.apache.org/contributor-guide/index.html). Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements. We have [docs about reviews](https://datafusion.apache.org/contributor-guide/index.html#reviewing-pull-requests). TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try `@` mentioning [one of the committers](https://projects.apache.org/committee.html?datafusion). ## Help wanted Please feel leave your own comments on the ticket if you are looking for help ## Community * [Weekly Call](https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit#heading=h.kpjkpncdmt1g) * Slack/Discord: [info links](https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord) ## Upcoming meetups: * [Dec 18 Chicago](https://github.com/apache/datafusion/discussions/12894): https://lu.ma/eq5myc5i @adriangb @timsaucer * TBD: https://github.com/apache/datafusion/discussions/12988 * [Jan 15Boston](https://github.com/apache/datafusion/discussions/13165) ## Background: Previous update: https://github.com/apache/datafusion/issues/13035 ## Andrew's Focus Areas: We are preparing for the [43.0.0 release](https://github.com/apache/datafusion/issues/12470) and I am personally pretty excited about (and thus actively help / put to the top of my review list) - https://github.com/apache/datafusion/issues/12821 (thanks to the epic work of @Rachelint, @goldmedal, @jayzhan211, @Dandandan @XiangpengHao and others, we are quite close) - https://github.com/apache/datafusion/issues/8709 (very close to finishing thanks @jcsherin @jatin510) - https://github.com/apache/datafusion/issues/12740 (also almost done thanks to @Omega359 and @jonathanc-n) - https://github.com/apache/datafusion/issues/12114 (thanks @Rachelint for all your help so far) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
