alamb commented on code in PR #57:
URL: https://github.com/apache/datafusion-site/pull/57#discussion_r1966495105


##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20

Review Comment:
   The site still renders a "author" even though there is no author listed 🤔 
   
   <img width="803" alt="Screenshot 2025-02-22 at 6 10 11 AM" 
src="https://github.com/user-attachments/assets/c5d264a3-d836-477e-8c38-c74d85a99ba7";
 />
   
   I think it would be good to mark yourself as the author if you are ok with 
it - this is a great piece of writing and I think you should get the credit for 
it. It also shows diversity in the writers of DataFusion content
   
   ```suggestion
   date: 2025-02-20
   author: Bruce Ritchie
   ```
   
   If you would prefer you could also list the author as pmc



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->

Review Comment:
   ```suggestion
   <!-- see https://github.com/apache/datafusion/issues/11631 for details -->
   ```



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 
+for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
+of work went into making this happen!
+
+<img
+src="/blog/images/datafusion-45.0.0/performance_over_time.png"
+width="100%"
+class="img-responsive"
+alt="ClickBench performance results over time for DataFusion"
+/>
+
+**Figure 1**: ClickBench performance improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). 
+
+The task of [integrating] the new [Arrow StringView] which significantly 
improves performance 
+for workloads that scan, filter and group by variable length string and binary 
data was completed 
+and enabled by default in the past 6 months. The improvement is especially 
pronounced for Parquet 
+files due to [upstream work in the parquet reader]. Kudos to [@XiangpengHong], 
[@AriesDevil], 
+[@PsiACE], [@Weijun-H], [@a10y], and [@RinChanNOWWW] for driving this project.
+
+[the fastest single node engine]: 
https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/
+[clickbench]: https://benchmark.clickhouse.com/
+[integrating]: https://github.com/apache/datafusion/issues/10918
+[Arrow StringView]: 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
+[multiple variable length columns in the `GROUP BY` clause]: 
https://github.com/apache/datafusion/issues/9403
+[upstream work in the parquet reader]: 
https://github.com/apache/arrow-rs/issues/5530
+
+## Improved Quality 📋
+
+DataFusion continues to improve overall in quality. In addition to ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the 
addition of the 
+[SQLite sqllogictest suite] thanks to [@Omega359]. These tests run over 5 
million 
+sql statements on every push to the main branch.
+
+Support for [explicitly checking logical plan invariants] was added by 
[@wiedld] which 
+can help catch implicit changes that might cause problems during upgrades.
+
+[SQLite sqllogictest suite]: https://github.com/apache/datafusion/pull/13936
+[explicitly checking logical plan invariants]: 
https://github.com/apache/datafusion/pull/13651
+
+## Improved Documentation 📚
+
+We continue to improve the documentation to make it easier to get started 
using DataFusion. 
+During the last 6 months two projects were initiated to migrate the function 
documentation
+from strictly static markdown files. First, [@Omega359] [created a framework] 
to allow function
+documentation to be generated from code and [@jonathanc-n] and others helped 
with the migration,
+then [@comphead] lead a project to [create a doc macro] to allow for an even 
easier way to write 
+function documentation. A special thanks to [@Chen-Yuan-Lai] for migrating 
many functions to 
+the new syntax.
+
+[created a framework]: https://github.com/apache/datafusion/pull/12668
+[create a doc macro]: https://github.com/apache/datafusion/pull/12822
+
+Additionally, the [examples] were [refactored] and [cleaned up] to improve 
their usefulness.
+
+[examples]: https://github.com/apache/datafusion/pull/13877
+[refactored]: https://github.com/apache/datafusion/pull/13905
+[cleaned up]: https://github.com/apache/datafusion/pull/13950
+
+## New Features ✨
+
+There are too many new features in the last 6 months to list them all, but here
+are some highlights:
+
+### Functions
+* Window functions were migrated from BuiltInWindowFunctions to UDFs 
([@jcsherin])
+* Aggregate functions were migrated from builtin functions to UDFs
+* As mentioned above function documentation was extracted from the markdown 
files
+* Some new functions and sql support were added including '[show functions]', 
'[to_local_time]',
+  '[regexp_count]', '[map_extract]', '[array_distance]', '[array_any_value]', 
'[greatest]',
+  '[least]', '[arrays_overlap]'

Review Comment:
   ```suggestion
   * Uniform Window Functions:  `BuiltInWindowFunctions` was removed and all 
now use UDFs ([@jcsherin])
   * Uniform Aggregate Functions: `BuiltInAggregateFunctions` was removed and 
all now used UDFs
   * As mentioned above function documentation was extracted from the markdown 
files
   * Some new functions and sql support were added including '[show 
functions]', '[to_local_time]',
     '[regexp_count]', '[map_extract]', '[array_distance]', 
'[array_any_value]', '[greatest]',
     '[least]', '[arrays_overlap]'
   ```



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 
+for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
+of work went into making this happen!
+
+<img
+src="/blog/images/datafusion-45.0.0/performance_over_time.png"

Review Comment:
   BTW thanks again to @pmcgleenon  for creating these images ❤️ 
   <img width="1248" alt="Screenshot 2025-02-22 at 6 17 01 AM" 
src="https://github.com/user-attachments/assets/c416d8f7-386f-47d3-8928-0cc0b08d568b";
 />
   
   



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 
+for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
+of work went into making this happen!
+
+<img
+src="/blog/images/datafusion-45.0.0/performance_over_time.png"
+width="100%"
+class="img-responsive"
+alt="ClickBench performance results over time for DataFusion"
+/>
+
+**Figure 1**: ClickBench performance improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). 
+
+The task of [integrating] the new [Arrow StringView] which significantly 
improves performance 
+for workloads that scan, filter and group by variable length string and binary 
data was completed 
+and enabled by default in the past 6 months. The improvement is especially 
pronounced for Parquet 
+files due to [upstream work in the parquet reader]. Kudos to [@XiangpengHong], 
[@AriesDevil], 
+[@PsiACE], [@Weijun-H], [@a10y], and [@RinChanNOWWW] for driving this project.
+
+[the fastest single node engine]: 
https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/
+[clickbench]: https://benchmark.clickhouse.com/
+[integrating]: https://github.com/apache/datafusion/issues/10918
+[Arrow StringView]: 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
+[multiple variable length columns in the `GROUP BY` clause]: 
https://github.com/apache/datafusion/issues/9403
+[upstream work in the parquet reader]: 
https://github.com/apache/arrow-rs/issues/5530
+
+## Improved Quality 📋
+
+DataFusion continues to improve overall in quality. In addition to ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the 
addition of the 
+[SQLite sqllogictest suite] thanks to [@Omega359]. These tests run over 5 
million 
+sql statements on every push to the main branch.
+
+Support for [explicitly checking logical plan invariants] was added by 
[@wiedld] which 
+can help catch implicit changes that might cause problems during upgrades.

Review Comment:
   I suggest we could also mention various other quality initaitives such as:
   
   1. Learning from GlareDB's experience: 
https://github.com/apache/datafusion/issues/13525
   2. More extensive prerelease testing: 
https://github.com/apache/datafusion/issues/13661



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 
+for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
+of work went into making this happen!
+
+<img
+src="/blog/images/datafusion-45.0.0/performance_over_time.png"
+width="100%"
+class="img-responsive"
+alt="ClickBench performance results over time for DataFusion"
+/>
+
+**Figure 1**: ClickBench performance improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). 
+
+The task of [integrating] the new [Arrow StringView] which significantly 
improves performance 
+for workloads that scan, filter and group by variable length string and binary 
data was completed 
+and enabled by default in the past 6 months. The improvement is especially 
pronounced for Parquet 
+files due to [upstream work in the parquet reader]. Kudos to [@XiangpengHong], 
[@AriesDevil], 
+[@PsiACE], [@Weijun-H], [@a10y], and [@RinChanNOWWW] for driving this project.
+
+[the fastest single node engine]: 
https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/
+[clickbench]: https://benchmark.clickhouse.com/
+[integrating]: https://github.com/apache/datafusion/issues/10918
+[Arrow StringView]: 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
+[multiple variable length columns in the `GROUP BY` clause]: 
https://github.com/apache/datafusion/issues/9403
+[upstream work in the parquet reader]: 
https://github.com/apache/arrow-rs/issues/5530
+
+## Improved Quality 📋
+
+DataFusion continues to improve overall in quality. In addition to ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the 
addition of the 
+[SQLite sqllogictest suite] thanks to [@Omega359]. These tests run over 5 
million 

Review Comment:
   100%



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 

Review Comment:
   BTW it would be amazing if you could add a link to this ticket for "A lot 
   of work went into making this happen!":
   
   https://github.com/apache/datafusion/issues/12821
   
   (I couldn't figure out how to make a suggestion with that link)



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 
+for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
+of work went into making this happen!
+
+<img
+src="/blog/images/datafusion-45.0.0/performance_over_time.png"
+width="100%"
+class="img-responsive"
+alt="ClickBench performance results over time for DataFusion"
+/>
+
+**Figure 1**: ClickBench performance improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). 
+
+The task of [integrating] the new [Arrow StringView] which significantly 
improves performance 
+for workloads that scan, filter and group by variable length string and binary 
data was completed 
+and enabled by default in the past 6 months. The improvement is especially 
pronounced for Parquet 
+files due to [upstream work in the parquet reader]. Kudos to [@XiangpengHong], 
[@AriesDevil], 
+[@PsiACE], [@Weijun-H], [@a10y], and [@RinChanNOWWW] for driving this project.
+
+[the fastest single node engine]: 
https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/
+[clickbench]: https://benchmark.clickhouse.com/
+[integrating]: https://github.com/apache/datafusion/issues/10918
+[Arrow StringView]: 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
+[multiple variable length columns in the `GROUP BY` clause]: 
https://github.com/apache/datafusion/issues/9403
+[upstream work in the parquet reader]: 
https://github.com/apache/arrow-rs/issues/5530
+
+## Improved Quality 📋
+
+DataFusion continues to improve overall in quality. In addition to ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the 
addition of the 
+[SQLite sqllogictest suite] thanks to [@Omega359]. These tests run over 5 
million 
+sql statements on every push to the main branch.
+
+Support for [explicitly checking logical plan invariants] was added by 
[@wiedld] which 
+can help catch implicit changes that might cause problems during upgrades.
+
+[SQLite sqllogictest suite]: https://github.com/apache/datafusion/pull/13936
+[explicitly checking logical plan invariants]: 
https://github.com/apache/datafusion/pull/13651
+
+## Improved Documentation 📚
+
+We continue to improve the documentation to make it easier to get started 
using DataFusion. 
+During the last 6 months two projects were initiated to migrate the function 
documentation
+from strictly static markdown files. First, [@Omega359] [created a framework] 
to allow function
+documentation to be generated from code and [@jonathanc-n] and others helped 
with the migration,
+then [@comphead] lead a project to [create a doc macro] to allow for an even 
easier way to write 
+function documentation. A special thanks to [@Chen-Yuan-Lai] for migrating 
many functions to 
+the new syntax.
+
+[created a framework]: https://github.com/apache/datafusion/pull/12668
+[create a doc macro]: https://github.com/apache/datafusion/pull/12822
+
+Additionally, the [examples] were [refactored] and [cleaned up] to improve 
their usefulness.
+
+[examples]: https://github.com/apache/datafusion/pull/13877
+[refactored]: https://github.com/apache/datafusion/pull/13905
+[cleaned up]: https://github.com/apache/datafusion/pull/13950
+
+## New Features ✨
+
+There are too many new features in the last 6 months to list them all, but here
+are some highlights:
+
+### Functions
+* Window functions were migrated from BuiltInWindowFunctions to UDFs 
([@jcsherin])
+* Aggregate functions were migrated from builtin functions to UDFs
+* As mentioned above function documentation was extracted from the markdown 
files
+* Some new functions and sql support were added including '[show functions]', 
'[to_local_time]',
+  '[regexp_count]', '[map_extract]', '[array_distance]', '[array_any_value]', 
'[greatest]',
+  '[least]', '[arrays_overlap]'
+
+### FFI
+* Foreign Function Interface work has started. This should allow for 
+  [using table providers] across languages and versions of DataFusion. This 
+  is especially pertinent for integration with [delta-rs] and other table 
formats.
+
+[delta-rs]: https://delta-io.github.io/delta-rs/
+
+### Materialized Views
+* [@suremarc] has added a [materialized view implementation] in 
datafusion-contrib 🚀
+
+### Substrait
+* A lot of work was put into improving and enhancing substrait support 
([@Blizzara], [@westonpace], [@tokoko], [@vbarua], [@LatrecheYasser], 
[@notfilippo] and others)
+
+[show functions]: https://github.com/apache/datafusion/pull/13799
+[to_local_time]: https://github.com/apache/datafusion/pull/11347
+[regexp_count]: https://github.com/apache/datafusion/pull/12970
+[map_extract]: https://github.com/apache/datafusion/pull/11969
+[array_distance]: https://github.com/apache/datafusion/pull/12211
+[array_any_value]: https://github.com/apache/datafusion/pull/12329
+[greatest]: https://github.com/apache/datafusion/pull/12474
+[least]: https://github.com/apache/datafusion/pull/13786
+[arrays_overlap]: https://github.com/apache/datafusion/pull/14217
+[using table providers]: https://github.com/apache/datafusion/pull/12920
+[materialized view implementation]: 
https://github.com/datafusion-contrib/datafusion-materialized-views
+
+## Looking Ahead: The Next Six Months 🔭 
+
+One of the long term goals of [@alamb], DataFusion's PMC chair, has been to 
have 
+[1000 DataFusion based projects]. This may be the year that happens!
+
+The community has been [discussing what we will work on in the next six 
months].
+Some major initiatives are likely to be:
+
+1. *Performance*: A [number of items have been identified] as areas that could 
use additional work
+2. *Memory usage*: Tracking and improving memory usage, statistics and 
spilling to disk 
+3. *GSOC*: DataFusion is hopefully selected as a project and we start 
accepting and supporting student projects 
+4. *FFI*: Extending the FFI implementation to support to all types of UDF's 
and SessionContext
+

Review Comment:
   I suggest also  mentioning the spark compatible function library as I think 
that may draw in quite a few more users / usecases:
   - https://github.com/apache/datafusion/issues/5600



##########
content/blog/2025-02-20-datafusion-45.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion 45.0.0 Released
+date: 2025-02-20
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/9602 for details -->
+
+## Introduction
+
+We are very proud to announce [DataFusion 45.0.0]. This blog highlights some 
of the
+many major improvements since we released [DataFusion 40.0.0] and a preview of
+what the community is thinking about in the next 6 months. It has been an 
exciting
+period of development for DataFusion!
+
+[DataFusion 40.0.0]: 
https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/
+[DataFusion 45.0.0]: https://crates.io/crates/datafusion/45.0.0
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that
+uses [Apache Arrow] as its in-memory format. DataFusion is used by developers 
to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While [DataFusion’s primary design
+goal] is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a [dataframe library],
+[python library] and [command line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion’s primary design goal]: 
https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+
+DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone. 
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions. 
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.
+
+
+## Community Growth  📈 
+
+In the last 6 months, between `40.0.0` and `45.0.0`, our community continues to
+grow in new and exciting ways.
+
+1. We added several PMC members and new committers: [@jayzhan211] and 
[@jonahgoa] joined the PMC,
+   [@2010YOUY01], [@rachelint], [@findpi], [@iffyio], [@goldmedal], 
[@Weijun-H], [@Michael-J-Ward] and [@korowa]
+   joined as committers. See the [mailing list] for more details.
+2. In the [core DataFusion repo] alone we reviewed and accepted almost 1600 
PRs from 206 different
+   committers, created over 1100 issues and closed 751 of them 🚀. All changes 
are listed in the detailed
+   [changelogs].
+3. DataFusion focused meetups happened in multiple cities around the world: 
[Hangzhou], [Belgrade], [New York], 
+   [Seattle], [Chicago], [Boston] and [Amsterdam] as well as a Rust NYC meetup 
in NYC focussed on DataFusion.
+
+[core DataFusion repo]: https://github.com/apache/arrow-datafusion
+[changelogs]: https://github.com/apache/datafusion/tree/main/dev/changelog
+[mailing list]: https://lists.apache.org/list.html?d...@datafusion.apache.org
+[Hangzhou]: 
https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273
+[Belgrade]: https://github.com/apache/datafusion/discussions/11431
+[New York]: https://github.com/apache/datafusion/discussions/11213
+[Seattle]: https://github.com/apache/datafusion/discussions/10348
+[Chicago]: https://github.com/apache/datafusion/discussions/12894
+[Boston]: https://github.com/apache/datafusion/discussions/13165 
+[Amsterdam]: https://github.com/apache/datafusion/discussions/12988
+
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+     1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+     206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+
+DataFusion has put in an application to be part of [Google Summer of Code] 
with a 
+[number of ideas] for projects with mentors already selected. Additionally, 
[some ideas] on
+how to make DataFusion an ideal selection for university database projects 
such as the 
+[CMU database classes] have been put forward.
+
+[Google Summer of Code]: https://summerofcode.withgoogle.com/
+[number of ideas]: https://github.com/apache/datafusion/issues/14478
+[some ideas]: https://github.com/apache/datafusion/issues/14373
+[CMU database classes]: https://15445.courses.cs.cmu.edu/spring2025/
+
+In addition, DataFusion has been appearing publicly more and more, both online 
and offline. Here are some highlights:
+
+1. A [demonstration of how uwheel] is integrated into DataFusion
+2. Integrating StringView into DataFusion - [part 1] and [part 2]
+3. [Building streams] with DataFusion
+4. [Caching in DataFusion]: Don't read twice
+5. [Parquet pruning in DataFusion]: Read no more than you need
+6. DataFusion is one of [The 10 coolest open source software tools]
+7. [Building databases over a weekend]
+
+[demonstration of how uwheel]: https://uwheel.rs/post/datafusion_uwheel/
+[part 1]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
+[part 2]: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
+[Building streams]: https://techontherocks.show/3
+[Caching in DataFusion]: https://blog.haoxp.xyz/posts/caching-datafusion
+[Parquet pruning in DataFusion]: https://blog.haoxp.xyz/posts/parquet-to-arrow/
+[The 10 coolest open source software tools]: 
https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3
+[Building databases over a weekend]: 
https://www.denormalized.io/blog/building-databases
+
+## Improved Performance 🚀 
+
+DataFusion hit a milestone in its development by becoming [the fastest single 
node engine] 
+for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
+of work went into making this happen!

Review Comment:
   We have subsequently been supplanted again by DuckDB / Hyper which I think 
we should acknowledge here too
   
   ```suggestion
   for querying Apache Parquet files in [clickbench] benchmark for the 43.0.0 
release. A lot 
   of work went into making this happen! While other engines have subsequently 
gotten faster, 
   displacing DataFusion from the top spot, DataFusion is remains near the top 
and we [are planing 
   more improvements].
   
   [are planing more improvements]: 
https://github.com/apache/datafusion/issues/14586
   ```
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to