2010YOUY01 commented on code in PR #115:
URL: https://github.com/apache/datafusion-site/pull/115#discussion_r2371596135


##########
content/blog/2025-09-24-datafusion-50.0.0.md:
##########
@@ -0,0 +1,390 @@
+---
+layout: post
+title: Apache DataFusion 50.0.0 Released
+date: 2025-09-24
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+## Introduction
+
+We are proud to announce the release of [DataFusion 50.0.0]. This blog post
+highlights some of the major improvements since the release of [DataFusion
+49.0.0]. The complete list of changes is available in the [changelog].
+
+[DataFusion 50.0.0]: https://crates.io/crates/datafusion/50.0.0
+[DataFusion 49.0.0]: 
https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0/
+[changelog]: 
https://github.com/apache/datafusion/blob/branch-50/dev/changelog/50.0.0.md
+
+
+## Performance Improvements 🚀
+
+> **📝TODO** *Update chart*
+
+DataFusion continues to focus on enhancing performance, as shown in the
+ClickBench and other results.
+
+<img src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png"
+  width="100%" class="img-responsive" alt="ClickBench performance results over
+  time for DataFusion" />
+
+**Figure 1**: ClickBench performance improvements over time Average and median
+normalized query execution times for ClickBench queries for each git revision.
+Query times are normalized using the ClickBench definition. Data and 
definitions
+on the [DataFusion Benchmarking
+Page](https://alamb.github.io/datafusion-benchmarking/).
+
+Here are some noteworthy optimizations added since DataFusion 49:
+
+**Dynamic Filter Pushdown Improvements**
+
+The dynamic filter pushdown optimization, which allows runtime filters to cut
+down on the amount of data read, has been extended to support **inner hash
+joins**. This optimization dramatically improves the performance of inner joins
+when one of the relations is relatively small or filtered by a highly selective
+selection. Consider the following example:
+
+```sql
+-- retrieve the orders of the customer with c_phone = '25-989-741-2988'
+SELECT *
+FROM customer
+JOIN orders on c_custkey = o_custkey
+WHERE c_phone = '25-989-741-2988';
+```
+
+While previously the entire `orders` relation would be scanned to join with the
+target customer, now the dynamic filter pushdown can filter it right at the
+source, keeping the data loaded at a minimum. The result is an order of
+magnitude faster execution time. This
+[article](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/) goes
+into more detail about the dynamic filter pushdown optimization in DataFusion.
+
+The dynamic filter pushdown optimization in the TopK operator has also been
+improved in DataFusion 50.0.0, ensuring that the filters used are as selective
+as possible. You can read more about it in this
+[ticket](https://github.com/apache/datafusion/pull/16433).
+
+The next step will be to extend the dynamic filters to other types of joins,
+such as left and right ones.

Review Comment:
   Do we have a EPIC for this next step? We can put the link here.



##########
content/blog/2025-09-24-datafusion-50.0.0.md:
##########
@@ -0,0 +1,390 @@
+---
+layout: post
+title: Apache DataFusion 50.0.0 Released
+date: 2025-09-24
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+## Introduction
+
+We are proud to announce the release of [DataFusion 50.0.0]. This blog post
+highlights some of the major improvements since the release of [DataFusion
+49.0.0]. The complete list of changes is available in the [changelog].
+
+[DataFusion 50.0.0]: https://crates.io/crates/datafusion/50.0.0
+[DataFusion 49.0.0]: 
https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0/
+[changelog]: 
https://github.com/apache/datafusion/blob/branch-50/dev/changelog/50.0.0.md
+
+
+## Performance Improvements 🚀
+
+> **📝TODO** *Update chart*
+
+DataFusion continues to focus on enhancing performance, as shown in the
+ClickBench and other results.
+
+<img src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png"
+  width="100%" class="img-responsive" alt="ClickBench performance results over
+  time for DataFusion" />
+
+**Figure 1**: ClickBench performance improvements over time Average and median
+normalized query execution times for ClickBench queries for each git revision.
+Query times are normalized using the ClickBench definition. Data and 
definitions
+on the [DataFusion Benchmarking
+Page](https://alamb.github.io/datafusion-benchmarking/).
+
+Here are some noteworthy optimizations added since DataFusion 49:
+
+**Dynamic Filter Pushdown Improvements**
+
+The dynamic filter pushdown optimization, which allows runtime filters to cut
+down on the amount of data read, has been extended to support **inner hash
+joins**. This optimization dramatically improves the performance of inner joins
+when one of the relations is relatively small or filtered by a highly selective
+selection. Consider the following example:
+
+```sql
+-- retrieve the orders of the customer with c_phone = '25-989-741-2988'
+SELECT *
+FROM customer
+JOIN orders on c_custkey = o_custkey
+WHERE c_phone = '25-989-741-2988';
+```
+
+While previously the entire `orders` relation would be scanned to join with the
+target customer, now the dynamic filter pushdown can filter it right at the
+source, keeping the data loaded at a minimum. The result is an order of
+magnitude faster execution time. This
+[article](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/) goes
+into more detail about the dynamic filter pushdown optimization in DataFusion.
+
+The dynamic filter pushdown optimization in the TopK operator has also been
+improved in DataFusion 50.0.0, ensuring that the filters used are as selective
+as possible. You can read more about it in this
+[ticket](https://github.com/apache/datafusion/pull/16433).
+
+The next step will be to extend the dynamic filters to other types of joins,
+such as left and right ones.
+
+**Nested Loop Optimization**
+
+The nested loop join has been rewritten to reduce the execution time and the
+amount of memory required by using a finer-grained approach. Briefly, we now
+join one right batch with one left row at a time instead of joining a right
+batch with the entire left side in one step. This prevents having to 
potentially
+materialize large amounts of data at once. This new implementation also avoids
+some `indices <-> batches` conversions that were required in the old approach,
+further reducing the execution time.

Review Comment:
   ```suggestion
   The nested loop join has been rewritten to reduce execution time and memory
   usage by adopting a finer-grained approach. Specifically, we now limit the 
   intermediate data size to around a single `RecordBatch` for better memory
   efficiency, and we have eliminated redundant conversions from the old 
   implementation to further improve execution speed.
   ```
   Thank you for the blog, for this paragraph I think we can omit some 
implementation details, and only keep the high-level ideas.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to