Re: [PR] [BLOG] spatial tables in data lakehouses [sedona]

via GitHub Tue, 21 Oct 2025 16:17:34 -0700


Copilot commented on code in PR #2418:
URL: https://github.com/apache/sedona/pull/2418#discussion_r2449947458



##########
docs/blog/posts/spatial-tables-lakehouse.md:
##########
@@ -0,0 +1,528 @@
+---
+date:
+  created: 2025-10-21
+links:
+  - SedonaDB: https://sedona.apache.org/sedonadb/
+  - SpatialBench: https://sedona.apache.org/spatialbench/
+authors:
+  - matt_powers
+title: "Managing spatial tables in Data Lakehouses with Iceberg"
+---
+
+<!--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+-->
+
+This post explains the benefits of the Lakehouse Architecture for spatial 
tables and how Lakehouses differ from data warehouses and data lakes.
+
+<!-- more -->
+
+Many of the benefits that Lakehouses (e.g., Iceberg, Delta Lake, Hudi) provide 
for tabular data also apply to spatial data, for example:
+
+* Reliable transactions
+* Versioned data
+* Time travel
+* Schema enforcement
+* Optimizations
+* And many more
+
+The spatial data community can use Lakehouses as of Iceberg v3, which adds 
geometry/geography types.
+
+Spatial data requires different types of metadata and optimizations, but 
doesn’t always require entirely different file formats.  Iceberg can now store 
the metadata needed for geometry and geography columns.  It’s excellent that 
you can now use Iceberg for tabular and spatial data.
+
+This post also explains why Lakehouse architecture is often better than 
warehouses and data lakes.  Let’s start with a detailed description of the 
Lakehouse architecture.
+
+## Data Lakehouse Architecture Overview
+
+A Lakehouse stores data in a Lakehouse Storage System like Iceberg, Delta 
Lake, or Hudi.  Iceberg v3 is currently the best option for spatial data 
because it natively supports geometry and geography columns.
+
+Tables in Lakehouses are governed by a catalog like Unity Catalog or Apache 
Polaris. The catalog allows role-based access control (RBAC) and features like 
multi-table transactions.
+
+You can query tables in the Lakehouse Architecture for BI, reporting, data 
science, machine learning, and other complex analyses.
+
+The following table illustrates the Lakehouse Architecture:
+
+![Lakehouse architecture](../../image/blog/spatial_tables/image1.png){ 
align=center width="80%" }
+
+The Lakehouse Architecture offers several advantages:
+
+* Data is stored in open formats so any engine can query it, and there is no 
vendor lock-in.
+* Lakehouses support all the features familiar to data warehouses, like 
reliable transactions, DML operations, and role-based access control.
+* Lakehouses are often performant enough for low-latency applications like BI 
dashboards.
+* Lakehouses are interoperable with proprietary tools like BigQuery, Redshift, 
or Esri.
+* You can store Lakehouses in cloud-based storage systems without any 
additional charges.
+* Lakehouses are compatible with any engine. You can use one engine for 
ingestion, another for ETL, and a third for ML. The architecture encourages 
using the best engine for the job.
+
+Let’s see how Lakehouses differ from data lakes.
+
+## Spatial tables in Lakehouses vs. Data Lakes
+
+Data Lakes store data in files without a metadata layer, so they don’t 
guarantee reliable transactions.
+
+Here are examples of spatial data lakes:
+
+* GeoParquet files stored in AWS S3
+* GeoJSON files stored in Azure Blob Storage
+* CSV files with WKT geometry data stored in GCP
+
+Since data lakes do not support reliable transactions, they cannot support 
developer-friendly features like delete and merge, require downtime when 
datasets are mutated, and do not offer advanced performance features.  Data 
lakes don’t support features like deletion vectors or small file compaction.
+
+The Lakehouse metadata layer allows for convenience functions and much better 
performance.
+
+The Lakehouse metadata layer is relatively small, so the storage costs for a 
Lakehouse and a data lake are about the same. Lakehouses allow for better 
performance, so compute expenses are generally lower than those of a data lake.
+
+## Spatial tables in Lakehouses vs. Data Warehouses
+
+A Data Warehouse is an analytics system powered by a proprietary engine and 
file formats. However, due to market pressures, this definition has shifted, 
and some Data Warehouses began supporting Lakehouse Storage Systems in addition 
to proprietary file formats. Many modern customers don’t want vendor lock-in 
via a proprietary file format.
+
+Data Warehouses have the following limitations:
+
+* They usually bundle storage with compute in pricing packages, so you must 
pay for more compute, even if you just want more storage.
+* They store data in proprietary file formats, which aren’t compatible with 
other engines.
+* Queries can run slower when you store data in open file formats.
+* Sharing compute with other users can cause performance degradation when one 
user runs a large query.
+
+In modern times, the strict definition of a data warehouse is shifting because 
engines that used to support only proprietary file formats now support open 
file formats.  You can now think of a Data Warehouse as a system that contains 
either a proprietary engine or a proprietary file format.
+
+Many modern enterprises prefer the Lakehouse architecture because it’s open, 
compatible with any engine that builds a connector, vendor-neutral, and 
low-cost.
+
+Let’s now see how to create some Iceberg tables.
+
+## Creating tabular tables with Iceberg
+
+Here’s how to create a `customers` table with `id` and `first_name` columns:
+
+```sql
+CREATE TABLE local.db.customers (id string, first_name string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Let’s append some data to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "Bob"),
+        ("b", "Mary"),
+        ("c", "Sue"),
+    ],
+    ["id", "first_name"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customers")
+```
+
+Let’s see how to run a query on the table:
+
+```
+sedona.table("local.db.customers").show()
+
++---+----------+
+| id|first_name|
++---+----------+
+|  a|       Bob|
+|  b|      Mary|
+|  c|       Sue|
++---+----------+
+```
+
+Creating a table with tabular data is straightforward.  Now, let’s see how to 
make a table with spatial data in Iceberg.
+
+## Creating spatial tables with Iceberg v3
+
+Let’s now create a `customer_purchases` table with a `purchase_location` 
column.
+
+Here’s how to create this spatial table with Iceberg:
+
+```sql
+CREATE TABLE local.db.customer_purchases (id string, price double, geometry 
geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Now append some spatial data with longitude/latitude coordinates to the table:
+
+```python
+coords = [
+    (-88.110352, 24.006326),
+    (-77.080078, 24.006326),
+    (-77.080078, 31.503629),
+    (-88.110352, 31.503629),
+    (-88.110352, 24.006326),
+]
+df = sedona.createDataFrame(
+    [
+        ("a", 10.99, Polygon(coords)),
+        ("b", 3.5, Point(1, 2)),
+        ("c", 1.95, Point(3, 4)),
+    ],
+    ["id", "price", "geometry"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customer_purchases")
+```
+
+The spatial table contains points and polygons. Some purchases have an exact 
location, and others occur in a region.
+
+Let’s see how to join the tabular table with the spatial table.
+
+## Joining an Iceberg tabular table with a spatial table
+
+Here’s how to join the `customers` and `customer_purchases` tables.
+
+```
+customers = sedona.table("local.db.customers")
+purchases = sedona.table("local.db.customer_purchases")
+
+joined = customers.join(purchases, "id")
+joined.show()
+
++---+----------+-----+--------------------+
+| id|first_name|price|            geometry|
++---+----------+-----+--------------------+
+|  a|       Bob|10.99|POLYGON ((-88.110...|
+|  b|      Mary|  3.5|         POINT (1 2)|
+|  c|       Sue| 1.95|         POINT (3 4)|
++---+----------+-----+--------------------+
+```
+
+Now we can see the customer information and the location of their purchases 
all in one table.
+
+Keep reading for an example with a spatial join, which is a join based on the 
geometry columns of two tables.
+
+It’s easy to join any tables with Sedona, regardless of the underlying file 
format, because Sedona has so many built-in file readers (e.g., you can easily 
join one table stored in Shapefiles and another stored in GeoParquet files). 
But it’s even easier when Iceberg stores the tabular and spatial tables in the 
same catalog.
+
+## Optimizing spatial tables in Lakehouses
+
+This section shows how queries can run faster on Iceberg tables.
+
+Let’s query the Overture Maps Foundation buildings dataset stored in 
GeoParquet files.
+
+```python
+(
+    sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+    .withColumn("geometry", ST_GeomFromWKB(col("geometry")))
+    .select("id", "geometry", "num_floors", "roof_color")
+    .createOrReplaceTempView("my_fun_view")
+)
+```
+
+Using this GeoParquet dataset, let’s run a filtering query to get the number 
of buildings in a small region near Gainesville, Florida.
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from my_fun_view
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query runs in 45 seconds.
+
+Let’s convert this dataset to Iceberg:
+
+```python
+df = sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+
+sql = """
+CREATE TABLE local.db.overture_2025_03_19_1_buildings_building (id string, 
geometry geometry, num_floors integer, roof_color string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+"""
+sedona.sql(sql)
+
+(
+    df.select("id", "geometry", "num_floors", "roof_color")
+    .write.format("iceberg")
+    .mode("overwrite")
+    .saveAsTable("local.db.overture_2025_03_19_1_buildings_building")
+)
+```
+
+Now let's rerun the same query on the Iceberg table:
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from local.db.overture_2025_03_19_1_buildings_building
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query executes in 4 seconds.
+
+To make this Iceberg query run even faster, we could perform even more 
optimizations to colocate spatially near data in the same files.
+
+## More geospatial examples in Iceberg
+
+This section uses a contrived example to demonstrate some Iceberg features 
that help you manage your spatial data.  Start by creating two tables.  One for 
the blue geometric objects and another for the orange polygons in the following 
graph:
+
+![Table geometries](../../image/blog/spatial_tables/image2.png){ align=center 
width="80%" }
+
+You can start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icegeometries (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `a`, `b, c, d, `and` e` to the table:

Review Comment:
   Corrected spacing and formatting in the list of objects from '`b, c, d, 
`and` e`' to '`b`, `c`, `d`, and `e`'.
   ```suggestion
   Append objects `a`, `b`, `c`, `d`, and `e` to the table:
   ```



##########
docs/blog/posts/spatial-tables-lakehouse.md:
##########
@@ -0,0 +1,528 @@
+---
+date:
+  created: 2025-10-21
+links:
+  - SedonaDB: https://sedona.apache.org/sedonadb/
+  - SpatialBench: https://sedona.apache.org/spatialbench/
+authors:
+  - matt_powers
+title: "Managing spatial tables in Data Lakehouses with Iceberg"
+---
+
+<!--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+-->
+
+This post explains the benefits of the Lakehouse Architecture for spatial 
tables and how Lakehouses differ from data warehouses and data lakes.
+
+<!-- more -->
+
+Many of the benefits that Lakehouses (e.g., Iceberg, Delta Lake, Hudi) provide 
for tabular data also apply to spatial data, for example:
+
+* Reliable transactions
+* Versioned data
+* Time travel
+* Schema enforcement
+* Optimizations
+* And many more
+
+The spatial data community can use Lakehouses as of Iceberg v3, which adds 
geometry/geography types.
+
+Spatial data requires different types of metadata and optimizations, but 
doesn’t always require entirely different file formats.  Iceberg can now store 
the metadata needed for geometry and geography columns.  It’s excellent that 
you can now use Iceberg for tabular and spatial data.
+
+This post also explains why Lakehouse architecture is often better than 
warehouses and data lakes.  Let’s start with a detailed description of the 
Lakehouse architecture.
+
+## Data Lakehouse Architecture Overview
+
+A Lakehouse stores data in a Lakehouse Storage System like Iceberg, Delta 
Lake, or Hudi.  Iceberg v3 is currently the best option for spatial data 
because it natively supports geometry and geography columns.
+
+Tables in Lakehouses are governed by a catalog like Unity Catalog or Apache 
Polaris. The catalog allows role-based access control (RBAC) and features like 
multi-table transactions.
+
+You can query tables in the Lakehouse Architecture for BI, reporting, data 
science, machine learning, and other complex analyses.
+
+The following table illustrates the Lakehouse Architecture:
+
+![Lakehouse architecture](../../image/blog/spatial_tables/image1.png){ 
align=center width="80%" }
+
+The Lakehouse Architecture offers several advantages:
+
+* Data is stored in open formats so any engine can query it, and there is no 
vendor lock-in.
+* Lakehouses support all the features familiar to data warehouses, like 
reliable transactions, DML operations, and role-based access control.
+* Lakehouses are often performant enough for low-latency applications like BI 
dashboards.
+* Lakehouses are interoperable with proprietary tools like BigQuery, Redshift, 
or Esri.
+* You can store Lakehouses in cloud-based storage systems without any 
additional charges.
+* Lakehouses are compatible with any engine. You can use one engine for 
ingestion, another for ETL, and a third for ML. The architecture encourages 
using the best engine for the job.
+
+Let’s see how Lakehouses differ from data lakes.
+
+## Spatial tables in Lakehouses vs. Data Lakes
+
+Data Lakes store data in files without a metadata layer, so they don’t 
guarantee reliable transactions.
+
+Here are examples of spatial data lakes:
+
+* GeoParquet files stored in AWS S3
+* GeoJSON files stored in Azure Blob Storage
+* CSV files with WKT geometry data stored in GCP
+
+Since data lakes do not support reliable transactions, they cannot support 
developer-friendly features like delete and merge, require downtime when 
datasets are mutated, and do not offer advanced performance features.  Data 
lakes don’t support features like deletion vectors or small file compaction.
+
+The Lakehouse metadata layer allows for convenience functions and much better 
performance.
+
+The Lakehouse metadata layer is relatively small, so the storage costs for a 
Lakehouse and a data lake are about the same. Lakehouses allow for better 
performance, so compute expenses are generally lower than those of a data lake.
+
+## Spatial tables in Lakehouses vs. Data Warehouses
+
+A Data Warehouse is an analytics system powered by a proprietary engine and 
file formats. However, due to market pressures, this definition has shifted, 
and some Data Warehouses began supporting Lakehouse Storage Systems in addition 
to proprietary file formats. Many modern customers don’t want vendor lock-in 
via a proprietary file format.
+
+Data Warehouses have the following limitations:
+
+* They usually bundle storage with compute in pricing packages, so you must 
pay for more compute, even if you just want more storage.
+* They store data in proprietary file formats, which aren’t compatible with 
other engines.
+* Queries can run slower when you store data in open file formats.
+* Sharing compute with other users can cause performance degradation when one 
user runs a large query.
+
+In modern times, the strict definition of a data warehouse is shifting because 
engines that used to support only proprietary file formats now support open 
file formats.  You can now think of a Data Warehouse as a system that contains 
either a proprietary engine or a proprietary file format.
+
+Many modern enterprises prefer the Lakehouse architecture because it’s open, 
compatible with any engine that builds a connector, vendor-neutral, and 
low-cost.
+
+Let’s now see how to create some Iceberg tables.
+
+## Creating tabular tables with Iceberg
+
+Here’s how to create a `customers` table with `id` and `first_name` columns:
+
+```sql
+CREATE TABLE local.db.customers (id string, first_name string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Let’s append some data to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "Bob"),
+        ("b", "Mary"),
+        ("c", "Sue"),
+    ],
+    ["id", "first_name"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customers")
+```
+
+Let’s see how to run a query on the table:
+
+```
+sedona.table("local.db.customers").show()
+
++---+----------+
+| id|first_name|
++---+----------+
+|  a|       Bob|
+|  b|      Mary|
+|  c|       Sue|
++---+----------+
+```
+
+Creating a table with tabular data is straightforward.  Now, let’s see how to 
make a table with spatial data in Iceberg.
+
+## Creating spatial tables with Iceberg v3
+
+Let’s now create a `customer_purchases` table with a `purchase_location` 
column.
+
+Here’s how to create this spatial table with Iceberg:
+
+```sql
+CREATE TABLE local.db.customer_purchases (id string, price double, geometry 
geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Now append some spatial data with longitude/latitude coordinates to the table:
+
+```python
+coords = [
+    (-88.110352, 24.006326),
+    (-77.080078, 24.006326),
+    (-77.080078, 31.503629),
+    (-88.110352, 31.503629),
+    (-88.110352, 24.006326),
+]
+df = sedona.createDataFrame(
+    [
+        ("a", 10.99, Polygon(coords)),
+        ("b", 3.5, Point(1, 2)),
+        ("c", 1.95, Point(3, 4)),
+    ],
+    ["id", "price", "geometry"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customer_purchases")
+```
+
+The spatial table contains points and polygons. Some purchases have an exact 
location, and others occur in a region.
+
+Let’s see how to join the tabular table with the spatial table.
+
+## Joining an Iceberg tabular table with a spatial table
+
+Here’s how to join the `customers` and `customer_purchases` tables.
+
+```
+customers = sedona.table("local.db.customers")
+purchases = sedona.table("local.db.customer_purchases")
+
+joined = customers.join(purchases, "id")
+joined.show()
+
++---+----------+-----+--------------------+
+| id|first_name|price|            geometry|
++---+----------+-----+--------------------+
+|  a|       Bob|10.99|POLYGON ((-88.110...|
+|  b|      Mary|  3.5|         POINT (1 2)|
+|  c|       Sue| 1.95|         POINT (3 4)|
++---+----------+-----+--------------------+
+```
+
+Now we can see the customer information and the location of their purchases 
all in one table.
+
+Keep reading for an example with a spatial join, which is a join based on the 
geometry columns of two tables.
+
+It’s easy to join any tables with Sedona, regardless of the underlying file 
format, because Sedona has so many built-in file readers (e.g., you can easily 
join one table stored in Shapefiles and another stored in GeoParquet files). 
But it’s even easier when Iceberg stores the tabular and spatial tables in the 
same catalog.
+
+## Optimizing spatial tables in Lakehouses
+
+This section shows how queries can run faster on Iceberg tables.
+
+Let’s query the Overture Maps Foundation buildings dataset stored in 
GeoParquet files.
+
+```python
+(
+    sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+    .withColumn("geometry", ST_GeomFromWKB(col("geometry")))
+    .select("id", "geometry", "num_floors", "roof_color")
+    .createOrReplaceTempView("my_fun_view")
+)
+```
+
+Using this GeoParquet dataset, let’s run a filtering query to get the number 
of buildings in a small region near Gainesville, Florida.
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from my_fun_view
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query runs in 45 seconds.
+
+Let’s convert this dataset to Iceberg:
+
+```python
+df = sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+
+sql = """
+CREATE TABLE local.db.overture_2025_03_19_1_buildings_building (id string, 
geometry geometry, num_floors integer, roof_color string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+"""
+sedona.sql(sql)
+
+(
+    df.select("id", "geometry", "num_floors", "roof_color")
+    .write.format("iceberg")
+    .mode("overwrite")
+    .saveAsTable("local.db.overture_2025_03_19_1_buildings_building")
+)
+```
+
+Now let's rerun the same query on the Iceberg table:
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from local.db.overture_2025_03_19_1_buildings_building
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query executes in 4 seconds.
+
+To make this Iceberg query run even faster, we could perform even more 
optimizations to colocate spatially near data in the same files.
+
+## More geospatial examples in Iceberg
+
+This section uses a contrived example to demonstrate some Iceberg features 
that help you manage your spatial data.  Start by creating two tables.  One for 
the blue geometric objects and another for the orange polygons in the following 
graph:
+
+![Table geometries](../../image/blog/spatial_tables/image2.png){ align=center 
width="80%" }
+
+You can start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icegeometries (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `a`, `b, c, d, `and` e` to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "LINESTRING(1.0 3.0,3.0 1.0)"),
+        ("b", "LINESTRING(2.0 5.0,6.0 1.0)"),
+        ("c", "POLYGON((7.0 4.0,9.0 4.0,9.0 3.0,7.0 3.0,7.0 4.0))"),
+        ("d", "LINESTRING(2.0 7.0,4.0 9.0,7.0 8.0)"),
+        ("e", "LINESTRING(10.0 9.0,10.0 6.0)"),
+    ],
+    ["id", "geometry"],
+)
+df = df.withColumn("geometry", ST_GeomFromText(col("geometry")))
+
+df.write.format("iceberg").mode("append").saveAsTable("some_catalog.matt.icegeometries")
+```
+
+Check the content of the table:
+
+```
+sedona.sql("SELECT * FROM 
some_catalog.matt.icegeometries;").show(truncate=False)
+
++---+-----------------------------------+
+|id |geometry                           |
++---+-----------------------------------+
+|a  |LINESTRING (1 3, 3 1)              |
+|b  |LINESTRING (2 5, 6 1)              |
+|c  |POLYGON ((7 4, 9 4, 9 3, 7 3, 7 4))|
+|d  |LINESTRING (2 7, 4 9, 7 8)         |
+|e  |LINESTRING (10 9, 10 6)            |
++---+-----------------------------------+
+```
+
+Iceberg makes it easy to delete rows of data in a table based on a predicate.
+
+Now create a table with the polygons.  Start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icepolygons (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `polygon_x `and `polygon_y` to the table:

Review Comment:
   Removed extra space after 'polygon_x' - should be '`polygon_x` and 
`polygon_y`'.
   ```suggestion
   Append objects `polygon_x` and `polygon_y` to the table:
   ```



##########
docs/blog/posts/spatial-tables-lakehouse.md:
##########
@@ -0,0 +1,528 @@
+---
+date:
+  created: 2025-10-21
+links:
+  - SedonaDB: https://sedona.apache.org/sedonadb/
+  - SpatialBench: https://sedona.apache.org/spatialbench/
+authors:
+  - matt_powers
+title: "Managing spatial tables in Data Lakehouses with Iceberg"
+---
+
+<!--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+-->
+
+This post explains the benefits of the Lakehouse Architecture for spatial 
tables and how Lakehouses differ from data warehouses and data lakes.
+
+<!-- more -->
+
+Many of the benefits that Lakehouses (e.g., Iceberg, Delta Lake, Hudi) provide 
for tabular data also apply to spatial data, for example:
+
+* Reliable transactions
+* Versioned data
+* Time travel
+* Schema enforcement
+* Optimizations
+* And many more
+
+The spatial data community can use Lakehouses as of Iceberg v3, which adds 
geometry/geography types.
+
+Spatial data requires different types of metadata and optimizations, but 
doesn’t always require entirely different file formats.  Iceberg can now store 
the metadata needed for geometry and geography columns.  It’s excellent that 
you can now use Iceberg for tabular and spatial data.
+
+This post also explains why Lakehouse architecture is often better than 
warehouses and data lakes.  Let’s start with a detailed description of the 
Lakehouse architecture.
+
+## Data Lakehouse Architecture Overview
+
+A Lakehouse stores data in a Lakehouse Storage System like Iceberg, Delta 
Lake, or Hudi.  Iceberg v3 is currently the best option for spatial data 
because it natively supports geometry and geography columns.
+
+Tables in Lakehouses are governed by a catalog like Unity Catalog or Apache 
Polaris. The catalog allows role-based access control (RBAC) and features like 
multi-table transactions.
+
+You can query tables in the Lakehouse Architecture for BI, reporting, data 
science, machine learning, and other complex analyses.
+
+The following table illustrates the Lakehouse Architecture:
+
+![Lakehouse architecture](../../image/blog/spatial_tables/image1.png){ 
align=center width="80%" }
+
+The Lakehouse Architecture offers several advantages:
+
+* Data is stored in open formats so any engine can query it, and there is no 
vendor lock-in.
+* Lakehouses support all the features familiar to data warehouses, like 
reliable transactions, DML operations, and role-based access control.
+* Lakehouses are often performant enough for low-latency applications like BI 
dashboards.
+* Lakehouses are interoperable with proprietary tools like BigQuery, Redshift, 
or Esri.
+* You can store Lakehouses in cloud-based storage systems without any 
additional charges.
+* Lakehouses are compatible with any engine. You can use one engine for 
ingestion, another for ETL, and a third for ML. The architecture encourages 
using the best engine for the job.
+
+Let’s see how Lakehouses differ from data lakes.
+
+## Spatial tables in Lakehouses vs. Data Lakes
+
+Data Lakes store data in files without a metadata layer, so they don’t 
guarantee reliable transactions.
+
+Here are examples of spatial data lakes:
+
+* GeoParquet files stored in AWS S3
+* GeoJSON files stored in Azure Blob Storage
+* CSV files with WKT geometry data stored in GCP
+
+Since data lakes do not support reliable transactions, they cannot support 
developer-friendly features like delete and merge, require downtime when 
datasets are mutated, and do not offer advanced performance features.  Data 
lakes don’t support features like deletion vectors or small file compaction.
+
+The Lakehouse metadata layer allows for convenience functions and much better 
performance.
+
+The Lakehouse metadata layer is relatively small, so the storage costs for a 
Lakehouse and a data lake are about the same. Lakehouses allow for better 
performance, so compute expenses are generally lower than those of a data lake.
+
+## Spatial tables in Lakehouses vs. Data Warehouses
+
+A Data Warehouse is an analytics system powered by a proprietary engine and 
file formats. However, due to market pressures, this definition has shifted, 
and some Data Warehouses began supporting Lakehouse Storage Systems in addition 
to proprietary file formats. Many modern customers don’t want vendor lock-in 
via a proprietary file format.
+
+Data Warehouses have the following limitations:
+
+* They usually bundle storage with compute in pricing packages, so you must 
pay for more compute, even if you just want more storage.
+* They store data in proprietary file formats, which aren’t compatible with 
other engines.
+* Queries can run slower when you store data in open file formats.
+* Sharing compute with other users can cause performance degradation when one 
user runs a large query.
+
+In modern times, the strict definition of a data warehouse is shifting because 
engines that used to support only proprietary file formats now support open 
file formats.  You can now think of a Data Warehouse as a system that contains 
either a proprietary engine or a proprietary file format.
+
+Many modern enterprises prefer the Lakehouse architecture because it’s open, 
compatible with any engine that builds a connector, vendor-neutral, and 
low-cost.
+
+Let’s now see how to create some Iceberg tables.
+
+## Creating tabular tables with Iceberg
+
+Here’s how to create a `customers` table with `id` and `first_name` columns:
+
+```sql
+CREATE TABLE local.db.customers (id string, first_name string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Let’s append some data to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "Bob"),
+        ("b", "Mary"),
+        ("c", "Sue"),
+    ],
+    ["id", "first_name"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customers")
+```
+
+Let’s see how to run a query on the table:
+
+```
+sedona.table("local.db.customers").show()
+
++---+----------+
+| id|first_name|
++---+----------+
+|  a|       Bob|
+|  b|      Mary|
+|  c|       Sue|
++---+----------+
+```
+
+Creating a table with tabular data is straightforward.  Now, let’s see how to 
make a table with spatial data in Iceberg.
+
+## Creating spatial tables with Iceberg v3
+
+Let’s now create a `customer_purchases` table with a `purchase_location` 
column.
+
+Here’s how to create this spatial table with Iceberg:
+
+```sql
+CREATE TABLE local.db.customer_purchases (id string, price double, geometry 
geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Now append some spatial data with longitude/latitude coordinates to the table:
+
+```python
+coords = [
+    (-88.110352, 24.006326),
+    (-77.080078, 24.006326),
+    (-77.080078, 31.503629),
+    (-88.110352, 31.503629),
+    (-88.110352, 24.006326),
+]
+df = sedona.createDataFrame(
+    [
+        ("a", 10.99, Polygon(coords)),
+        ("b", 3.5, Point(1, 2)),
+        ("c", 1.95, Point(3, 4)),
+    ],
+    ["id", "price", "geometry"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customer_purchases")
+```
+
+The spatial table contains points and polygons. Some purchases have an exact 
location, and others occur in a region.
+
+Let’s see how to join the tabular table with the spatial table.
+
+## Joining an Iceberg tabular table with a spatial table
+
+Here’s how to join the `customers` and `customer_purchases` tables.
+
+```
+customers = sedona.table("local.db.customers")
+purchases = sedona.table("local.db.customer_purchases")
+
+joined = customers.join(purchases, "id")
+joined.show()
+
++---+----------+-----+--------------------+
+| id|first_name|price|            geometry|
++---+----------+-----+--------------------+
+|  a|       Bob|10.99|POLYGON ((-88.110...|
+|  b|      Mary|  3.5|         POINT (1 2)|
+|  c|       Sue| 1.95|         POINT (3 4)|
++---+----------+-----+--------------------+
+```
+
+Now we can see the customer information and the location of their purchases 
all in one table.
+
+Keep reading for an example with a spatial join, which is a join based on the 
geometry columns of two tables.
+
+It’s easy to join any tables with Sedona, regardless of the underlying file 
format, because Sedona has so many built-in file readers (e.g., you can easily 
join one table stored in Shapefiles and another stored in GeoParquet files). 
But it’s even easier when Iceberg stores the tabular and spatial tables in the 
same catalog.
+
+## Optimizing spatial tables in Lakehouses
+
+This section shows how queries can run faster on Iceberg tables.
+
+Let’s query the Overture Maps Foundation buildings dataset stored in 
GeoParquet files.
+
+```python
+(
+    sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+    .withColumn("geometry", ST_GeomFromWKB(col("geometry")))
+    .select("id", "geometry", "num_floors", "roof_color")
+    .createOrReplaceTempView("my_fun_view")
+)
+```
+
+Using this GeoParquet dataset, let’s run a filtering query to get the number 
of buildings in a small region near Gainesville, Florida.
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from my_fun_view
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query runs in 45 seconds.
+
+Let’s convert this dataset to Iceberg:
+
+```python
+df = sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+
+sql = """
+CREATE TABLE local.db.overture_2025_03_19_1_buildings_building (id string, 
geometry geometry, num_floors integer, roof_color string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+"""
+sedona.sql(sql)
+
+(
+    df.select("id", "geometry", "num_floors", "roof_color")
+    .write.format("iceberg")
+    .mode("overwrite")
+    .saveAsTable("local.db.overture_2025_03_19_1_buildings_building")
+)
+```
+
+Now let's rerun the same query on the Iceberg table:
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from local.db.overture_2025_03_19_1_buildings_building
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query executes in 4 seconds.
+
+To make this Iceberg query run even faster, we could perform even more 
optimizations to colocate spatially near data in the same files.
+
+## More geospatial examples in Iceberg
+
+This section uses a contrived example to demonstrate some Iceberg features 
that help you manage your spatial data.  Start by creating two tables.  One for 
the blue geometric objects and another for the orange polygons in the following 
graph:
+
+![Table geometries](../../image/blog/spatial_tables/image2.png){ align=center 
width="80%" }
+
+You can start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icegeometries (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `a`, `b, c, d, `and` e` to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "LINESTRING(1.0 3.0,3.0 1.0)"),
+        ("b", "LINESTRING(2.0 5.0,6.0 1.0)"),
+        ("c", "POLYGON((7.0 4.0,9.0 4.0,9.0 3.0,7.0 3.0,7.0 4.0))"),
+        ("d", "LINESTRING(2.0 7.0,4.0 9.0,7.0 8.0)"),
+        ("e", "LINESTRING(10.0 9.0,10.0 6.0)"),
+    ],
+    ["id", "geometry"],
+)
+df = df.withColumn("geometry", ST_GeomFromText(col("geometry")))
+
+df.write.format("iceberg").mode("append").saveAsTable("some_catalog.matt.icegeometries")
+```
+
+Check the content of the table:
+
+```
+sedona.sql("SELECT * FROM 
some_catalog.matt.icegeometries;").show(truncate=False)
+
++---+-----------------------------------+
+|id |geometry                           |
++---+-----------------------------------+
+|a  |LINESTRING (1 3, 3 1)              |
+|b  |LINESTRING (2 5, 6 1)              |
+|c  |POLYGON ((7 4, 9 4, 9 3, 7 3, 7 4))|
+|d  |LINESTRING (2 7, 4 9, 7 8)         |
+|e  |LINESTRING (10 9, 10 6)            |
++---+-----------------------------------+
+```
+
+Iceberg makes it easy to delete rows of data in a table based on a predicate.
+
+Now create a table with the polygons.  Start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icepolygons (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `polygon_x `and `polygon_y` to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("polygon_x", "POLYGON((3.0 5.0,8.0 5.0,8.0 2.0,3.0 2.0,3.0 5.0))"),
+        ("polygon_y", "POLYGON((5.0 9.0,8.0 9.0,8.0 7.0,5.0 7.0,5.0 9.0))"),
+    ],
+    ["id", "geometry"],
+)
+df = df.withColumn("geometry", ST_GeomFromText(col("geometry")))
+
+df.write.format("iceberg").mode("append").saveAsTable("some_catalog.matt.icegeometries")

Review Comment:
   The code is appending polygon data to the 'icegeometries' table instead of 
the 'icepolygons' table. This should use 'some_catalog.matt.icepolygons' as the 
target table.
   ```suggestion
   
df.write.format("iceberg").mode("append").saveAsTable("some_catalog.matt.icepolygons")
   ```



##########
docs/blog/posts/spatial-tables-lakehouse.md:
##########
@@ -0,0 +1,528 @@
+---
+date:
+  created: 2025-10-21
+links:
+  - SedonaDB: https://sedona.apache.org/sedonadb/
+  - SpatialBench: https://sedona.apache.org/spatialbench/
+authors:
+  - matt_powers
+title: "Managing spatial tables in Data Lakehouses with Iceberg"
+---
+
+<!--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+-->
+
+This post explains the benefits of the Lakehouse Architecture for spatial 
tables and how Lakehouses differ from data warehouses and data lakes.
+
+<!-- more -->
+
+Many of the benefits that Lakehouses (e.g., Iceberg, Delta Lake, Hudi) provide 
for tabular data also apply to spatial data, for example:
+
+* Reliable transactions
+* Versioned data
+* Time travel
+* Schema enforcement
+* Optimizations
+* And many more
+
+The spatial data community can use Lakehouses as of Iceberg v3, which adds 
geometry/geography types.
+
+Spatial data requires different types of metadata and optimizations, but 
doesn’t always require entirely different file formats.  Iceberg can now store 
the metadata needed for geometry and geography columns.  It’s excellent that 
you can now use Iceberg for tabular and spatial data.
+
+This post also explains why Lakehouse architecture is often better than 
warehouses and data lakes.  Let’s start with a detailed description of the 
Lakehouse architecture.
+
+## Data Lakehouse Architecture Overview
+
+A Lakehouse stores data in a Lakehouse Storage System like Iceberg, Delta 
Lake, or Hudi.  Iceberg v3 is currently the best option for spatial data 
because it natively supports geometry and geography columns.
+
+Tables in Lakehouses are governed by a catalog like Unity Catalog or Apache 
Polaris. The catalog allows role-based access control (RBAC) and features like 
multi-table transactions.
+
+You can query tables in the Lakehouse Architecture for BI, reporting, data 
science, machine learning, and other complex analyses.
+
+The following table illustrates the Lakehouse Architecture:
+
+![Lakehouse architecture](../../image/blog/spatial_tables/image1.png){ 
align=center width="80%" }
+
+The Lakehouse Architecture offers several advantages:
+
+* Data is stored in open formats so any engine can query it, and there is no 
vendor lock-in.
+* Lakehouses support all the features familiar to data warehouses, like 
reliable transactions, DML operations, and role-based access control.
+* Lakehouses are often performant enough for low-latency applications like BI 
dashboards.
+* Lakehouses are interoperable with proprietary tools like BigQuery, Redshift, 
or Esri.
+* You can store Lakehouses in cloud-based storage systems without any 
additional charges.
+* Lakehouses are compatible with any engine. You can use one engine for 
ingestion, another for ETL, and a third for ML. The architecture encourages 
using the best engine for the job.
+
+Let’s see how Lakehouses differ from data lakes.
+
+## Spatial tables in Lakehouses vs. Data Lakes
+
+Data Lakes store data in files without a metadata layer, so they don’t 
guarantee reliable transactions.
+
+Here are examples of spatial data lakes:
+
+* GeoParquet files stored in AWS S3
+* GeoJSON files stored in Azure Blob Storage
+* CSV files with WKT geometry data stored in GCP
+
+Since data lakes do not support reliable transactions, they cannot support 
developer-friendly features like delete and merge, require downtime when 
datasets are mutated, and do not offer advanced performance features.  Data 
lakes don’t support features like deletion vectors or small file compaction.
+
+The Lakehouse metadata layer allows for convenience functions and much better 
performance.
+
+The Lakehouse metadata layer is relatively small, so the storage costs for a 
Lakehouse and a data lake are about the same. Lakehouses allow for better 
performance, so compute expenses are generally lower than those of a data lake.
+
+## Spatial tables in Lakehouses vs. Data Warehouses
+
+A Data Warehouse is an analytics system powered by a proprietary engine and 
file formats. However, due to market pressures, this definition has shifted, 
and some Data Warehouses began supporting Lakehouse Storage Systems in addition 
to proprietary file formats. Many modern customers don’t want vendor lock-in 
via a proprietary file format.
+
+Data Warehouses have the following limitations:
+
+* They usually bundle storage with compute in pricing packages, so you must 
pay for more compute, even if you just want more storage.
+* They store data in proprietary file formats, which aren’t compatible with 
other engines.
+* Queries can run slower when you store data in open file formats.
+* Sharing compute with other users can cause performance degradation when one 
user runs a large query.
+
+In modern times, the strict definition of a data warehouse is shifting because 
engines that used to support only proprietary file formats now support open 
file formats.  You can now think of a Data Warehouse as a system that contains 
either a proprietary engine or a proprietary file format.
+
+Many modern enterprises prefer the Lakehouse architecture because it’s open, 
compatible with any engine that builds a connector, vendor-neutral, and 
low-cost.
+
+Let’s now see how to create some Iceberg tables.
+
+## Creating tabular tables with Iceberg
+
+Here’s how to create a `customers` table with `id` and `first_name` columns:
+
+```sql
+CREATE TABLE local.db.customers (id string, first_name string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Let’s append some data to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "Bob"),
+        ("b", "Mary"),
+        ("c", "Sue"),
+    ],
+    ["id", "first_name"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customers")
+```
+
+Let’s see how to run a query on the table:
+
+```
+sedona.table("local.db.customers").show()
+
++---+----------+
+| id|first_name|
++---+----------+
+|  a|       Bob|
+|  b|      Mary|
+|  c|       Sue|
++---+----------+
+```
+
+Creating a table with tabular data is straightforward.  Now, let’s see how to 
make a table with spatial data in Iceberg.
+
+## Creating spatial tables with Iceberg v3
+
+Let’s now create a `customer_purchases` table with a `purchase_location` 
column.
+
+Here’s how to create this spatial table with Iceberg:
+
+```sql
+CREATE TABLE local.db.customer_purchases (id string, price double, geometry 
geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Now append some spatial data with longitude/latitude coordinates to the table:
+
+```python
+coords = [
+    (-88.110352, 24.006326),
+    (-77.080078, 24.006326),
+    (-77.080078, 31.503629),
+    (-88.110352, 31.503629),
+    (-88.110352, 24.006326),
+]
+df = sedona.createDataFrame(
+    [
+        ("a", 10.99, Polygon(coords)),
+        ("b", 3.5, Point(1, 2)),
+        ("c", 1.95, Point(3, 4)),
+    ],
+    ["id", "price", "geometry"],
+)
+
+df.write.format("iceberg").mode("append").saveAsTable("local.db.customer_purchases")
+```
+
+The spatial table contains points and polygons. Some purchases have an exact 
location, and others occur in a region.
+
+Let’s see how to join the tabular table with the spatial table.
+
+## Joining an Iceberg tabular table with a spatial table
+
+Here’s how to join the `customers` and `customer_purchases` tables.
+
+```
+customers = sedona.table("local.db.customers")
+purchases = sedona.table("local.db.customer_purchases")
+
+joined = customers.join(purchases, "id")
+joined.show()
+
++---+----------+-----+--------------------+
+| id|first_name|price|            geometry|
++---+----------+-----+--------------------+
+|  a|       Bob|10.99|POLYGON ((-88.110...|
+|  b|      Mary|  3.5|         POINT (1 2)|
+|  c|       Sue| 1.95|         POINT (3 4)|
++---+----------+-----+--------------------+
+```
+
+Now we can see the customer information and the location of their purchases 
all in one table.
+
+Keep reading for an example with a spatial join, which is a join based on the 
geometry columns of two tables.
+
+It’s easy to join any tables with Sedona, regardless of the underlying file 
format, because Sedona has so many built-in file readers (e.g., you can easily 
join one table stored in Shapefiles and another stored in GeoParquet files). 
But it’s even easier when Iceberg stores the tabular and spatial tables in the 
same catalog.
+
+## Optimizing spatial tables in Lakehouses
+
+This section shows how queries can run faster on Iceberg tables.
+
+Let’s query the Overture Maps Foundation buildings dataset stored in 
GeoParquet files.
+
+```python
+(
+    sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+    .withColumn("geometry", ST_GeomFromWKB(col("geometry")))
+    .select("id", "geometry", "num_floors", "roof_color")
+    .createOrReplaceTempView("my_fun_view")
+)
+```
+
+Using this GeoParquet dataset, let’s run a filtering query to get the number 
of buildings in a small region near Gainesville, Florida.
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from my_fun_view
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query runs in 45 seconds.
+
+Let’s convert this dataset to Iceberg:
+
+```python
+df = sedona.table("open_data.overture_2025_03_19_1.buildings_building")
+
+sql = """
+CREATE TABLE local.db.overture_2025_03_19_1_buildings_building (id string, 
geometry geometry, num_floors integer, roof_color string)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+"""
+sedona.sql(sql)
+
+(
+    df.select("id", "geometry", "num_floors", "roof_color")
+    .write.format("iceberg")
+    .mode("overwrite")
+    .saveAsTable("local.db.overture_2025_03_19_1_buildings_building")
+)
+```
+
+Now let's rerun the same query on the Iceberg table:
+
+```python
+spot = "POLYGON((-82.258759 29.129371, -82.180481 29.136569, -82.202454 
29.173747, -82.258759 29.129371))"
+sql = f"""
+select * from local.db.overture_2025_03_19_1_buildings_building
+where ST_Contains(ST_GeomFromWKT('{spot}'), geometry)
+"""
+sedona.sql(sql).count()
+```
+
+This query executes in 4 seconds.
+
+To make this Iceberg query run even faster, we could perform even more 
optimizations to colocate spatially near data in the same files.
+
+## More geospatial examples in Iceberg
+
+This section uses a contrived example to demonstrate some Iceberg features 
that help you manage your spatial data.  Start by creating two tables.  One for 
the blue geometric objects and another for the orange polygons in the following 
graph:
+
+![Table geometries](../../image/blog/spatial_tables/image2.png){ align=center 
width="80%" }
+
+You can start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icegeometries (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `a`, `b, c, d, `and` e` to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("a", "LINESTRING(1.0 3.0,3.0 1.0)"),
+        ("b", "LINESTRING(2.0 5.0,6.0 1.0)"),
+        ("c", "POLYGON((7.0 4.0,9.0 4.0,9.0 3.0,7.0 3.0,7.0 4.0))"),
+        ("d", "LINESTRING(2.0 7.0,4.0 9.0,7.0 8.0)"),
+        ("e", "LINESTRING(10.0 9.0,10.0 6.0)"),
+    ],
+    ["id", "geometry"],
+)
+df = df.withColumn("geometry", ST_GeomFromText(col("geometry")))
+
+df.write.format("iceberg").mode("append").saveAsTable("some_catalog.matt.icegeometries")
+```
+
+Check the content of the table:
+
+```
+sedona.sql("SELECT * FROM 
some_catalog.matt.icegeometries;").show(truncate=False)
+
++---+-----------------------------------+
+|id |geometry                           |
++---+-----------------------------------+
+|a  |LINESTRING (1 3, 3 1)              |
+|b  |LINESTRING (2 5, 6 1)              |
+|c  |POLYGON ((7 4, 9 4, 9 3, 7 3, 7 4))|
+|d  |LINESTRING (2 7, 4 9, 7 8)         |
+|e  |LINESTRING (10 9, 10 6)            |
++---+-----------------------------------+
+```
+
+Iceberg makes it easy to delete rows of data in a table based on a predicate.
+
+Now create a table with the polygons.  Start by creating the Iceberg table:
+
+```sql
+CREATE TABLE some_catalog.matt.icepolygons (id string, geometry geometry)
+USING iceberg
+TBLPROPERTIES('format-version'='3');
+```
+
+Append objects `polygon_x `and `polygon_y` to the table:
+
+```python
+df = sedona.createDataFrame(
+    [
+        ("polygon_x", "POLYGON((3.0 5.0,8.0 5.0,8.0 2.0,3.0 2.0,3.0 5.0))"),
+        ("polygon_y", "POLYGON((5.0 9.0,8.0 9.0,8.0 7.0,5.0 7.0,5.0 9.0))"),
+    ],
+    ["id", "geometry"],
+)
+df = df.withColumn("geometry", ST_GeomFromText(col("geometry")))
+
+df.write.format("iceberg").mode("append").saveAsTable("some_catalog.matt.icegeometries")
+```
+
+Here’s how you can delete all the linestrings that cross any polygon.
+
+```python
+sql = f"""
+DELETE FROM some_catalog.matt.icegeometries
+WHERE ST_Intersects(geometry, ST_GeomFromWKT('{polygon}'))
+"""
+sedona.sql(sql)
+```
+
+Check the table to see that geometries `b`, `c,` and `d` are deleted from the 
table.

Review Comment:
   Inconsistent spacing in backticks - '`c,`' should be '`c`,' with the comma 
outside the backticks.
   ```suggestion
   Check the table to see that geometries `b`, `c`, and `d` are deleted from 
the table.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [BLOG] spatial tables in data lakehouses [sedona]

Reply via email to