paleolimbot commented on code in PR #2591: URL: https://github.com/apache/sedona/pull/2591#discussion_r2684102427
########## docs/blog/posts/sedona-2025-year-in-review.md: ########## @@ -0,0 +1,150 @@ +--- +date: + created: 2026-01-11 +links: + - Release notes: https://sedona.apache.org/latest/setup/release-notes/ + - SedonaDB: https://sedona.apache.org/sedonadb/ + - SpatialBench: https://sedona.apache.org/spatialbench/ + - Apache Parquet and Iceberg native geo type: https://wherobots.com/blog/apache-iceberg-and-parquet-now-support-geo/ +authors: + - jia +title: "Apache Sedona 2025 Year in Review" +--- + +<!-- +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--> + +2025 was a milestone year for **Apache Sedona**. We made major progress in distributed spatial analytics on Spark, Flink, and Snowflake, launched a new single-node engine called SedonaDB, and pushed forward benchmarking and open geospatial data standards. + +This post summarizes the most important highlights from the Apache Sedona ecosystem in 2025. + +<!-- more --> + +## Apache Sedona Ecosystem Releases in 2025 + +Apache Sedona shipped four releases from January 2025 to January 2026: 1.7.1, 1.7.2, 1.8.0, and 1.8.1. In the same year, the Sedona ecosystem expanded in two major ways: we introduced SedonaDB for fast single-machine analytics and SpatialBench to make spatial performance comparisons reproducible. + +- Apache Sedona releases: Ongoing improvements across distributed engines and integrations (Spark, Flink, Snowflake). See the release notes for details. +- SedonaDB: A new single-node spatial engine built for interactive analytics and developer workflows. +- SpatialBench: A benchmark suite designed to standardize how we evaluate spatial SQL performance across engines. + +Release notes: [https://sedona.apache.org/latest/setup/release-notes/](https://sedona.apache.org/latest/setup/release-notes/) + +## Distributed Engines Highlights + +Across SedonaSpark, SedonaFlink, and SedonaSnow, 2025 brought major usability improvements, broader SQL coverage, and better support for modern open geospatial data formats: + +* GeoPandas API on SedonaSpark: Write GeoPandas-style code, but run it on Spark through Sedona, so familiar workflows like spatial joins (`sjoin`), buffering, distance, and coordinate system transforms can scale beyond a single machine. Learn more: [GeoPandas API for Apache Sedona](../../tutorial/geopandas-api.md). +* GeoStats for clustering, outliers, and hot spots: Built-in tools for common spatial statistics workflows on DataFrames, including DBSCAN clustering, Local Outlier Factor (LOF), and Getis-Ord Gi/Gi* hot spot analysis. Learn more: [Stats module](../../api/stats/sql.md). +* Faster SedonaSpark to GeoPandas conversion with GeoArrow: Convert query results to GeoPandas more efficiently using Arrow/GeoArrow, such as `geopandas.GeoDataFrame.from_arrow(dataframe_to_arrow(df))`. Learn more: [GeoPandas + Shapely interoperability](../../tutorial/geopandas-shapely.md). +* STAC catalog reader: Load STAC collections from local files, S3, or HTTPS endpoints using `sedona.read.format("stac")`, and apply time/area filters early so you read less data. Supports authenticated STAC APIs too. Learn more: [STAC catalog with Apache Sedona and Spark](../../tutorial/files/stac-sedona-spark.md). +* More built-in data sources: Easier ingestion from formats people use in practice, including GeoPackage and OSM PBF (OpenStreetMap). Learn more: [SedonaSQL / DataFrame I/O tutorial](../../tutorial/sql.md). +* Vectorized UDFs (Python): A faster way to run Python UDFs by processing data in batches using Apache Arrow, including geometry-aware UDFs with Shapely or GeoPandas GeoSeries. Learn more: [Spatial vectorized UDFs (Python only)](../../tutorial/sql.md). +* More functions across engines: Function coverage kept expanding across Spark, Flink, and Snowflake. For example: ST_ApproximateMedialAxis, ST_StraightSkeleton, ST_Collect_Agg, and ST_OrientedEnvelope. See the function catalogs for [SedonaSpark SQL](../../api/sql/Overview.md), [SedonaFlink SQL](../../api/flink/Overview.md), and [SedonaSnow SQL](../../api/snowflake/vector-data/Overview.md). + +## SedonaDB: A New Single-Node Spatial Engine + +One of the biggest developments in 2025 was the introduction of SedonaDB, a new analytics engine designed for geospatial data on a single machine. Review Comment: 🥳 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
