alamb commented on code in PR #83:
URL: https://github.com/apache/datafusion-site/pull/83#discussion_r2200412619


##########
content/blog/2025-07-11-datafusion-47.0.0.md:
##########
@@ -0,0 +1,272 @@
+---
+layout: post
+title: Apache DataFusion 47.0.0 Released
+date: 2025-07-11
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 47.0.0**! This 
new version represents a significant
+milestone for the project, packing in a wide range of improvements and fixes. 
You can find the complete details in the
+full 
[changelog](https://github.com/apache/datafusion/blob/branch-47/dev/changelog/47.0.0.md).
 We’ll highlight the most
+important changes below and guide you through upgrading.
+
+Note that DataFusion 47.0.0 was released in April 2025, but we are only now 
publishing the blog post due to 
+limited bandwidth in the DataFusion community. We apologize for the delay and 
encourage you to come help us
+accelerate the next release and announcements 
+by [joining the 
community](https://datafusion.apache.org/contributor-guide/communication.html)  
🎣.
+
+## Breaking Changes
+
+DataFusion 47.0.0 brings a few **breaking changes** that may require 
adjustments to your code as described in
+the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-47-0-0).
 Here are some notable ones:
+
+- [Upgrades to arrow-rs and arrow-parquet 55.0.0 and object_store 
0.12.0](https://github.com/apache/datafusion/pull/15466):
+  Several APIs changed in the underlying `arrow`, `parquet` and `object_store` 
libraries to use a `u64` instead of usize to better support
+  WASM. This requires converting from `usize` to `u64` occasionally as well as 
changes to ObjectStore implementations such as
+```Rust
+impl ObjectStore {
+    ...
+
+    // The range is now a u64 instead of usize
+    async fn get_range(&self, location: &Path, range: Range<u64>) -> 
ObjectStoreResult<Bytes> {
+        self.inner.get_range(location, range).await
+    }
+    
+    ...
+    
+    // the lifetime is now 'static instead of '_ (meaning the captured closure 
can't contain references)
+    // (this also applies to list_with_offset)
+    fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, 
ObjectStoreResult<ObjectMeta>> {
+        self.inner.list(prefix)
+    }
+}
+```
+- 
[DisplayFormatType::TreeRender](https://github.com/apache/datafusion/issues/14914):
+  Implementations of `ExecutionPlan` must also provide a description in the 
`DisplayFormatType::TreeRender` format to
+  provide support for the new [tree style 
explains](https://datafusion.apache.org/user-guide/sql/explain.html#tree-format-default).
+  This can be the same as the existing `DisplayFormatType::Default`.
+
+## Performance Improvements
+
+DataFusion 47.0.0 comes with numerous performance enhancements across the 
board. Here are some of the noteworthy
+optimizations in this release:
+
+- **`FIRST_VALUE` and `LAST_VALUE`:**  `FIRST_VALUE` and `LAST_VALUE` 
functions execute much faster for data with high cardinality such as those with 
many groups or partitions. DataFusion 47.0.0 executes the following in **7 
seconds** compared to **36 seconds** in DataFusion 46.0.0: `select id2, id4, 
first_value(v1 order by id2, id4) as r2 from '~/h2o_100m.parquet' group by id2, 
id4` (h2o.ai dataset). (PR's 
[#15266](https://github.com/apache/datafusion/pull/15266)

Review Comment:
   FYI @UBarney



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to