alamb opened a new issue, #14491:
URL: https://github.com/apache/datafusion/issues/14491

   ### Is your feature request related to a problem or challenge?
   
   ## Introduction
   This ticket is my weekly-ish summary of interesting things happening in 
DataFusion. Note this is not a complete list (it is what I remember / can 
find). Please  leave comments on this ticket about things that I may have 
missed or you think should get wider attention by the community. Follow on to 
https://github.com/apache/datafusion/issues/13970
   
   Reminder, find new content (and please post some!) to [Concepts, Readings, 
Events](https://datafusion.apache.org/user-guide/concepts-readings-events.html#concepts-readings-events)
  page
   
   ## Community Highlights
   - The [Recording](https://www.youtube.com/watch?v=GruBeVDoWq4) and slides 
are available from the [2025 Jan 24 
Amsterdam](https://www.meetup.com/data-drinks/events/304895184/): 
   https://github.com/apache/datafusion/discussions/12988
   - We are victims of our own success. At the time of writing there are over 
50 PRs in various states of review [check out the 
list](https://github.com/apache/datafusion/pulls?page=2&q=is%3Apr+is%3Aopen+-review%3Aapproved+-is%3Adraft+-author%3Aalamb).
 More help reviewing the better ๐Ÿ™ 
   - @comphead is working on a new frontpage: 
https://github.com/apache/datafusion/issues/14389
   - Lessons from CMU, courtesy of @lmwnshn  
https://github.com/apache/datafusion/issues/14373
   - [Papers we love NYC is reading the DataFusion paper this 
week](https://www.meetup.com/papers-we-love/events/305729353/?eventOrigin=group_upcoming_events)
   - @edmondop  added link to the [a job 
board](https://datafusion.apache.org/contributor-guide/communication.html#job-board):
 https://github.com/apache/datafusion/pull/14191
   - @jonbjo noted new user funner.io: 
https://github.com/apache/datafusion/pull/14316
   
   
![Image](https://github.com/user-attachments/assets/499d21a3-1924-4866-a28f-5b44629e32c4)
   
   
   # Releases!
   - [DataFusion 45](https://github.com/apache/datafusion/issues/14008) Release 
candidate is available. I have a great feeling about this one thanks to all the 
help testing from @shehabgamin @kevinjqliu @Omega359  and others
   - Arrow Minor release completed: 
https://github.com/apache/arrow-rs/issues/6929 (among other things has even 
faster parquet reading)
   
   
   # Performance 
   DataFusion's core value proposition is great performance without having to 
re-implement it yourself
   - @pmcgleenon ran the numbers, and DataFusion 44 is ๐ŸŒถ  on ClickBench: 
https://github.com/apache/datafusion/issues/13983 (45 is even better)
   - @XiangpengHao has a way to make parquet reading faster. We are looking for 
help testing. See https://github.com/apache/arrow-rs/pull/6921
   - @UBarney made reverse faster: 
https://github.com/apache/datafusion/pull/14195
   - @jatin510  implemented https://github.com/apache/datafusion/pull/14119 
   - @buraksenn @ozankabak and @berkaysynnada  and 
https://github.com/apache/datafusion/pull/14279 some sweet sort based 
optimizations
   - @rluvaton  made `array_agg` faster ๐Ÿš€ 
https://github.com/apache/datafusion/pull/14299
   - @Rachelint improved median a lot: 
https://github.com/apache/datafusion/pull/13681 ๐Ÿš€ 
   - And so did @2010YOUY01 https://github.com/apache/datafusion/pull/14399
   - @pepijnve  added https://github.com/apache/datafusion/pull/14276
   
   # Quality
   
   ## Testing
   - @wiedld added Logical and Physical plan invariants: 
https://github.com/apache/datafusion/pull/13986
   - @himadripal added https://github.com/apache/datafusion/pull/14284
   - @logan-keede fixed `--complete`: 
https://github.com/apache/datafusion/pull/14254
   - @buraksenn added https://github.com/apache/datafusion/pull/14307
   - @duongcongtoai https://github.com/apache/datafusion/pull/14395
   
   ## Bug Fixes
   DataFusion is in the "we are finding all the corner case bugs now" phase of 
its life and people are now bashing them down
   
   - @xudong963 fixed several limit pushdown bugs: 
https://github.com/apache/datafusion/pull/14192
   - @jatin510 https://github.com/apache/datafusion/pull/14236
   - @waynexia https://github.com/apache/datafusion/pull/14126
   - @dhegberg  https://github.com/apache/datafusion/pull/14312
   - @zhuqi-lucas https://github.com/apache/datafusion/pull/14338
   - @zhuqi-lucas ๐Ÿ› ๐Ÿ”จ  https://github.com/apache/datafusion/pull/14245 
   - @findepi https://github.com/apache/datafusion/pull/14349
   - @findepi  https://github.com/apache/datafusion/pull/14356
   - @jkosh44 https://github.com/apache/datafusion/pull/14289
   - @cht42 fixed https://github.com/apache/datafusion/pull/14401
   - @Omega359 and I fixed a bunch of coercion stuff: 
https://github.com/apache/datafusion/pull/14385, 
https://github.com/apache/datafusion/pull/14449, 
   - @zhuqi-lucas (again!) https://github.com/apache/datafusion/pull/14418
   
   ## Docs
   - @appletreeisyellow https://github.com/apache/datafusion/pull/14278
   
   
   ## Build time
   - We are starting to look more seriously at build time: 
https://github.com/apache/datafusion/issues/14256 (thanks @waynexia)
   
   ## Cleanups ๐Ÿงน 
   
   Now that we have a large useful codebase it is also important to keep it 
neat and tidy so we spend a non trivial time there too. 
   
   - physical-optimizer into its own crate (finally!): thanks to @logan-keede 
@berkaysynnada and @buraksenn. See 
https://github.com/apache/datafusion/pull/14190 
https://github.com/apache/datafusion/pull/14235, etc
   - @Chen-Yuan-Lai  is on a tear using BooleanBufferBuilder instead of 
NullBufferBuilder: https://github.com/apache/datafusion/pull/14181 et al
   - @Kimahriman fixed null handling for `array_has`: 
https://github.com/apache/datafusion/pull/13683
   - @logan-keede  has been consolidating code: 
https://github.com/apache/datafusion/pull/14240
   - @logan-keede also completed 
https://github.com/apache/datafusion/issues/10782 ๐Ÿšง  
   
   # Features
   
   ## We can have nice things! (Error messages)
   - @eliaperantoni added support for source code locations for error 
https://github.com/apache/datafusion/pull/13664 and has organized a project to 
add more support https://github.com/apache/datafusion/issues/14429
   
   - We started publishing the `datafusion-sqllogictest` crate to help testing 
in `icerberg-rust`: https://github.com/apache/datafusion/discussions/14229 
(thanks to @liurenjie1024 for the great idea)
   - @jayzhan211  unified advanced UDF argument handling: 
https://github.com/apache/datafusion/pull/14094
   - @gatesn  added support for `SUM` statistics: 
https://github.com/apache/datafusion/pull/14074
   - @erenavsarogullari  added https://github.com/apache/datafusion/pull/14217
   - @timsaucer made FFI hopefully more usable with asycn code: 
https://github.com/apache/datafusion/pull/13937 
   - @davisp made insert work in FFI: 
https://github.com/apache/datafusion/pull/14391
   
   ## Coming soon: Extension Types
   
   
   ## Misc
   - @Spaarsh  gave us ๐Ÿ‘พ ๐Ÿšข `<=>` in 
https://github.com/apache/datafusion/pull/14187
   
   
   # Looking to get more involved? Please help review code! ๐ŸŽฃ
   
   DataFusion has a long history of community members [contributing in all 
aspects of the 
project](https://datafusion.apache.org/contributor-guide/index.html).  
Reviewing PRs is an especially great way to get introduced to the project, help 
the community and grow your own knowledge  -- researching and understanding the 
code enough to review PRs also often inspires additional ideas for improvements.
   
   We have [docs about 
reviews](https://datafusion.apache.org/contributor-guide/index.html#reviewing-pull-requests).
 TLDR is: look for test coverage, if the change is understandable and  well 
documented, and if the code can be improved.  When you think the PR looks good 
to merge, try `@` mentioning [one of the 
committers](https://projects.apache.org/committee.html?datafusion). 
   
   ## Help wanted
   - I would love to see the community offer additional help testing, triaging 
bugs helping to make DataFusion a more stable foundation for building systems
   
   Please feel leave your own comments on this ticket if you are looking for 
help
   
   ## Community 
   * [Weekly 
Call](https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit#heading=h.kpjkpncdmt1g)
   * Slack/Discord: [info 
links](https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord)
 
   
   ## Upcoming meetups:
   * Help schedule some!
   
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to