alamb opened a new issue, #15005:
URL: https://github.com/apache/datafusion/issues/15005

   ### Is your feature request related to a problem or challenge?
   
   ## Introduction
   This ticket is my weekly-ish summary of interesting things happening in 
DataFusion. Note this is not a complete list (it is what I remember / can 
find). Please  leave comments on this ticket about things that I may have 
missed or you think should get wider attention by the community. 
   
   
   ## Community Highlights
   
   * DF 45 Blog post 
https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0/
   * @oznur-synnada updated the events page 
https://github.com/apache/datafusion/pull/14629
   * We are hosting a [Google Summer of 
Code](https://github.com/apache/datafusion/issues/14577) -- thanks again 
@oznur-synnada  for driving this
   
   # Releases!
   - [DataFusion 46](https://github.com/apache/datafusion/issues/14123) Release 
candidate is available. Huge thank you to @xudong963  for running this release. 
This one contains a [massive refactor of 
DataSource](https://github.com/apache/datafusion/pull/14224) from @ozankabak 
and @mertak-synnada 
   - Also huge shout out to @blaginin  for his help chasing down issues 
blocking the release: https://github.com/apache/datafusion/pull/14685
   - Another Huge shout out to @shehabgamin for his help testing and 
identifying issues pre-release
   - Check out the [DataFusion 46 Upgrade 
Guide](https://github.com/apache/datafusion/pull/14891) to help
   
   
   # Performance 
   DataFusion's core value proposition is great performance without having to 
re-implement it yourself
   - @Omega359 's improvement to https://github.com/apache/datafusion/pull/14653
   - @berkaysynnada  improved the sort tracking code more 
https://github.com/apache/datafusion/pull/14813
   - @zjregee made repeat 50% faster: 
https://github.com/apache/datafusion/pull/14697
   - @simonvandel made `to_hex` 2x faster: 
https://github.com/apache/datafusion/pull/14686
   - @simonvandel  also made `to_hex` 4x faster: 
https://github.com/apache/datafusion/pull/14675 (no string copies for the win!)
   - And @simonvandel  also updated `date_trunc` to be 2x faster: 
https://github.com/apache/datafusion/pull/14593
   - @Kev1n8  made `substr` faster: 
https://github.com/apache/datafusion/pull/14498
   
   # Quality
   
   ## Testing
   
   ## Bug Fixes
   DataFusion is in the "we are finding all the corner case bugs now" phase of 
its life and people are now bashing them down
   - @joroKr21 's fix for grouping exprs 
https://github.com/apache/datafusion/pull/14888
   - @anlinc helped fixed https://github.com/apache/datafusion/pull/14860
   - https://github.com/apache/datafusion/pull/14852 @rluvaton ๐Ÿ™ 
   - @xudong963 https://github.com/apache/datafusion/pull/14569
   
   ## Docs
   
   
   ## Build time
   
   ## Cleanups ๐Ÿงน 
   
   - physical-optimizer into its own crate (finally!): thanks to @logan-keede 
@berkaysynnada and @buraksenn. 
   - [breaking](https://github.com/apache/datafusion/pull/14873) the datafusion 
core [crate](https://github.com/apache/datafusion/pull/14951) apart (finally!): 
thanks to @logan-keede  and @AdamGS 
   - @onlyjackfrost @niebayes @irenjj @goldmedal and  others 
[have](https://github.com/apache/datafusion/pull/14727) 
[been](https://github.com/apache/datafusion/pull/14725) 
[migrating](https://github.com/apache/datafusion/pull/14856) 
[all](https://github.com/apache/datafusion/pull/14690) our functions to use 
`invoke_args` etc 
   - @jayzhan211 has been [Fixing up wild card handling 
](https://github.com/apache/datafusion/pull/14689)
   
   # Features
   Features under way
   - Statistics work: https://github.com/apache/datafusion/pull/14699
   - 
   
   
   ## Better Out of Core Support
   In general, DataFusion is getting better at handling datasets that are 
larger than can fit in memory. 
   - @davidhewitt's improvement here 
https://github.com/apache/datafusion/pull/14868
   - @2010YOUY01 's work to improve spilling for StringView 
https://github.com/apache/datafusion/pull/14823
   - @zhuqi-lucas  improved datafusion-cli: 
https://github.com/apache/datafusion/pull/14766
   - @Kontinuation  improved docs 
https://github.com/apache/datafusion/pull/14789  and implementation 
https://github.com/apache/datafusion/pull/14644# and testing 
https://github.com/apache/datafusion/pull/14642
   
   ## We can have nice things! (Explain plans)
   - @irenjj  took the first step towards 
https://github.com/apache/datafusion/pull/14677. I'll give you a teaser below. 
Come help with the follow on work on 
https://github.com/apache/datafusion/issues/14914
   
   
   ```
   
   > explain select * from t1 inner join t2 on t1.i=t2.i;
   
   
+---------------+------------------------------------------------------------+
   | plan_type     | plan                                                       
|
   
+---------------+------------------------------------------------------------+
   | logical_plan  | Inner Join: t1.i = t2.i                                    
|
   |               |   TableScan: t1 projection=[i]                             
|
   |               |   TableScan: t2 projection=[i]                             
|
   | physical_plan | โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                              
|
   |               | โ”‚    CoalesceBatchesExec    โ”‚                              
|
   |               | โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                              
|
   |               | โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                              
|
   |               | โ”‚        HashJoinExec       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               
|
   |               | โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚               
|
   |               | โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 
|
   |               | โ”‚       DataSourceExec      โ”‚โ”‚       DataSourceExec      โ”‚ 
|
   |               | โ”‚    --------------------   โ”‚โ”‚    --------------------   โ”‚ 
|
   |               | โ”‚    partition_sizes: [0]   โ”‚โ”‚       partitions: 1       โ”‚ 
|
   |               | โ”‚       partitions: 1       โ”‚โ”‚    partition_sizes: [0]   โ”‚ 
|
   |               | โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 
|
   |               |                                                            
|
   
+---------------+------------------------------------------------------------+
   2 row(s) fetched.
   ```
   
   ## Better Error Messages
   @eliaperantoni  is working with various contributors to make the error 
messages better. This work is tracked in 
   - https://github.com/apache/datafusion/issues/14429
   - https://github.com/apache/datafusion/pull/14439
   - @onlyjackfrost  https://github.com/apache/datafusion/pull/14849
   
   ## Misc
   - @simonvandel added https://github.com/apache/datafusion/pull/14830
   - @Lordworms made expression access nicer: 
https://github.com/apache/datafusion/pull/14712
   - @rkrishn7  did `UNION ALL BY NAME` 
https://github.com/apache/datafusion/pull/14538
   
   # Looking to get more involved? Please help review code! ๐ŸŽฃ
   
   DataFusion has a long history of community members [contributing in all 
aspects of the 
project](https://datafusion.apache.org/contributor-guide/index.html).  
Reviewing PRs is an especially great way to get introduced to the project, help 
the community and grow your own knowledge  -- researching and understanding the 
code enough to review PRs also often inspires additional ideas for improvements.
   
   We have [docs about 
reviews](https://datafusion.apache.org/contributor-guide/index.html#reviewing-pull-requests).
 TLDR is: look for test coverage, if the change is understandable and  well 
documented, and if the code can be improved.  When you think the PR looks good 
to merge, try `@` mentioning [one of the 
committers](https://projects.apache.org/committee.html?datafusion). 
   
   ## Help wanted
   - I would love to see the community offer additional help performance 
testing, triaging bugs helping to make DataFusion a more stable foundation for 
building systems
   
   Please feel leave your own comments on this ticket if you are looking for 
help
   
   ## Community 
   * [Weekly 
Call](https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit#heading=h.kpjkpncdmt1g)
   * Slack/Discord: [info 
links](https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord)
 
   
   ## Upcoming meetups:
   * Help schedule some!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to