This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/sedona-db.git
The following commit(s) were added to refs/heads/main by this push:
new 169cb1e chore(docs): Render notebooks to markdown on build (#101)
169cb1e is described below
commit 169cb1e0d3253cfe727a3bc1e7b473a22a654bd1
Author: Dewey Dunnington <[email protected]>
AuthorDate: Wed Sep 17 11:24:18 2025 -0500
chore(docs): Render notebooks to markdown on build (#101)
---
ci/scripts/build-docs.sh | 6 +
docs/geopandas-interop.md | 148 ++++++++++++++++
docs/overture-examples.md | 288 +++++++++++++++++++++++++++++++
docs/programming-guide.md | 227 ++++++++++++++++++++++++
docs/reference/python.md | 4 +
docs/requirements.txt | 4 +-
mkdocs.yml | 25 ++-
python/sedonadb/python/sedonadb/dbapi.py | 5 +-
8 files changed, 699 insertions(+), 8 deletions(-)
diff --git a/ci/scripts/build-docs.sh b/ci/scripts/build-docs.sh
index 4868303..cb12707 100755
--- a/ci/scripts/build-docs.sh
+++ b/ci/scripts/build-docs.sh
@@ -27,6 +27,12 @@ SEDONADB_DIR="$(cd "${SOURCE_DIR}/../.." && pwd)"
# Avoid a deprecation warning when building the docs
export JUPYTER_PLATFORM_DIRS=1
+# Convert all Jupyter notebooks in docs/ directory to markdown
+for notebook in $(find "${SEDONADB_DIR}/docs" -name "*.ipynb"); do
+ echo "Rendering ${notebook}"
+ jupyter nbconvert --to markdown "${notebook}"
+done
+
pushd "${SEDONADB_DIR}"
if mkdocs build --strict ; then
echo "Success!"
diff --git a/docs/geopandas-interop.md b/docs/geopandas-interop.md
new file mode 100644
index 0000000..fd7bb58
--- /dev/null
+++ b/docs/geopandas-interop.md
@@ -0,0 +1,148 @@
+# GeoPandas interoperability
+
+This example shows how to read a GeoJSON file with GeoPandas and then convert
the GeoPandas DataFrame to a SedonaDB DataFrame.
+
+Any file type that can be read by GeoPandas can also be read into a SedonaDB
DataFrame!
+
+
+```python
+import sedona.db
+import geopandas as gpd
+
+sd = sedona.db.connect()
+```
+
+### Read a GeoJSON file with GeoPandas
+
+
+```python
+gdf = gpd.read_file("some_data.json")
+```
+
+
+```python
+gdf
+```
+
+
+
+
+<div>
+<style scoped>
+ .dataframe tbody tr th:only-of-type {
+ vertical-align: middle;
+ }
+
+ .dataframe tbody tr th {
+ vertical-align: top;
+ }
+
+ .dataframe thead th {
+ text-align: right;
+ }
+</style>
+<table border="1" class="dataframe">
+ <thead>
+ <tr style="text-align: right;">
+ <th></th>
+ <th>prop0</th>
+ <th>prop1</th>
+ <th>geometry</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <th>0</th>
+ <td>value0</td>
+ <td>None</td>
+ <td>POINT (102 0.5)</td>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>value1</td>
+ <td>0.0</td>
+ <td>LINESTRING (102 0, 103 1, 104 0, 105 1)</td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>value2</td>
+ <td>{ "this": "that" }</td>
+ <td>POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+
+
+
+
+```python
+gdf.info()
+```
+
+ <class 'geopandas.geodataframe.GeoDataFrame'>
+ RangeIndex: 3 entries, 0 to 2
+ Data columns (total 3 columns):
+ # Column Non-Null Count Dtype
+ --- ------ -------------- -----
+ 0 prop0 3 non-null object
+ 1 prop1 2 non-null object
+ 2 geometry 3 non-null geometry
+ dtypes: geometry(1), object(2)
+ memory usage: 204.0+ bytes
+
+
+### Convert the GeoPandas DataFrame to a SedonaDB DataFrame
+
+
+```python
+df = sd.create_data_frame(gdf)
+```
+
+
+```python
+df.show()
+```
+
+ ┌────────┬────────────────────┬──────────────────────────────────────────┐
+ │ prop0 ┆ prop1 ┆ geometry │
+ │ utf8 ┆ utf8 ┆ geometry │
+ ╞════════╪════════════════════╪══════════════════════════════════════════╡
+ │ value0 ┆ ┆ POINT(102 0.5) │
+ ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ value1 ┆ 0.0 ┆ LINESTRING(102 0,103 1,104 0,105 1) │
+ ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ value2 ┆ { "this": "that" } ┆ POLYGON((100 0,101 0,101 1,100 1,100 0)) │
+ └────────┴────────────────────┴──────────────────────────────────────────┘
+
+
+## Read a FlatGeobuf file
+
+This code demonstrates how to read a FlatGeobuf file with GeoPandas and then
convert it to a SedonaDB DataFrame.
+
+
+```python
+path =
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb"
+gdf = gpd.read_file(path)
+```
+
+
+```python
+df = sd.create_data_frame(gdf)
+```
+
+
+```python
+df.show(3)
+```
+
+ ┌──────────────┬──────────────────────────────┐
+ │ name ┆ geometry │
+ │ utf8 ┆ geometry │
+ ╞══════════════╪══════════════════════════════╡
+ │ Vatican City ┆ POINT(12.4533865 41.9032822) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ San Marino ┆ POINT(12.4417702 43.9360958) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Vaduz ┆ POINT(9.5166695 47.1337238) │
+ └──────────────┴──────────────────────────────┘
diff --git a/docs/overture-examples.md b/docs/overture-examples.md
new file mode 100644
index 0000000..a06b618
--- /dev/null
+++ b/docs/overture-examples.md
@@ -0,0 +1,288 @@
+# SedonaDB Overture Examples
+
+This notebook shows how to query the Overture data with SedonaDB!
+
+
+```python
+import sedona.db
+import os
+
+os.environ["AWS_SKIP_SIGNATURE"] = "true"
+os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
+
+sd = sedona.db.connect()
+```
+
+## Overture buildings table
+
+
+```python
+df = sd.read_parquet(
+
"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=buildings/type=building/"
+)
+```
+
+
+```python
+df.limit(10).show()
+```
+
+
┌──────────────────────────────────────┬─────────────────────────────────────────┬───┬─────────────┐
+ │ id ┆ geometry
┆ … ┆ roof_height │
+ │ utf8view ┆ wkb_view <ogc:crs84>
┆ ┆ float64 │
+
╞══════════════════════════════════════╪═════════════════════════════════════════╪═══╪═════════════╡
+ │ 06533301-f2ec-42e0-8138-732ac25a7497 ┆ POLYGON((-58.4757066
-34.7389169,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ cc0c048c-088d-4cb3-9982-3961edfdf416 ┆ POLYGON((-58.4755777
-34.7389131,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ e52a0dbc-fb93-40e2-b1df-03626855299c ┆ POLYGON((-58.4754112
-34.7394253,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 28526977-9920-4cec-9840-5dd409a7cded ┆ POLYGON((-58.4752088
-34.7394754,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 0bc4c042-52ea-4ae7-9200-56221805fa2f ┆ POLYGON((-58.475273
-34.7394421,-58.47… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ c21dfee1-f5d9-4e0a-91cf-796f117518d4 ┆ POLYGON((-58.4750977
-34.7394357,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 3fe5efdd-1739-4088-8c8e-6f7f1b7cfcfe ┆ POLYGON((-58.4751684
-34.7394288,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ c144becc-fc8a-4bbc-aeef-359ac56a925a ┆ POLYGON((-58.4751787
-34.739396,-58.47… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 79d6c10a-2ff2-429d-a0e7-6eefc8939d14 ┆ POLYGON((-58.4753719
-34.7393189,-58.4… ┆ … ┆ │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ c1664c2d-2c0f-44c4-af08-58176e360613 ┆ POLYGON((-58.4753269
-34.7391919,-58.4… ┆ … ┆ │
+
└──────────────────────────────────────┴─────────────────────────────────────────┴───┴─────────────┘
+
+
+
+```python
+df.to_view("buildings")
+```
+
+
+```python
+# the buildings table is large and contains millions of rows
+sd.sql("""
+SELECT
+ COUNT(*)
+FROM
+ buildings
+""").show()
+```
+
+ ┌────────────┐
+ │ count(*) │
+ │ int64 │
+ ╞════════════╡
+ │ 2539170484 │
+ └────────────┘
+
+
+
+```python
+# check out the schema of the buildings table to see what it contains
+df.schema
+```
+
+
+
+
+ SedonaSchema with 24 fields:
+ id: Utf8View
+ geometry: wkb_view <ogc:crs84>
+ bbox: Struct(xmin Float32, xmax Float32, ymin Float32, ymax Float32)
+ version: Int32
+ sources: List(Field { name: "element", data_type: Struct([Field { name:
"property", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }, Field { name: "dataset", data_type: Utf8, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name:
"record_id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }, Field { name: "update_time", data_type: Utf8, nullable:
true, dict_id: 0, dict_is_ordered: fals [...]
+ level: Int32
+ subtype: Utf8View
+ class: Utf8View
+ height: Float64
+ names: Struct(primary Utf8, common Map(Field { name: "key_value",
data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value",
data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata:
{} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
false), rules List(Field { name: "element", data_type: Struct([Field { name:
"variant", data_type: Utf8, nullable: [...]
+ has_parts: Boolean
+ is_underground: Boolean
+ num_floors: Int32
+ num_floors_underground: Int32
+ min_height: Float64
+ min_floor: Int32
+ facade_color: Utf8View
+ facade_material: Utf8View
+ roof_material: Utf8View
+ roof_shape: Utf8View
+ roof_direction: Float64
+ roof_orientation: Utf8View
+ roof_color: Utf8View
+ roof_height: Float64
+
+
+
+
+```python
+# find all the buildings in New York city that are taller than 20 meters
+nyc_bbox_wkt = "POLYGON((-74.2591 40.4774, -74.2591 40.9176, -73.7004 40.9176,
-73.7004 40.4774, -74.2591 40.4774))"
+sd.sql(f"""
+SELECT
+ id,
+ height,
+ num_floors,
+ roof_shape,
+ ST_Centroid(geometry) as centroid
+FROM
+ buildings
+WHERE
+ is_underground = FALSE
+ AND height IS NOT NULL
+ AND height > 20
+ AND ST_Intersects(geometry, ST_SetSRID(ST_GeomFromText('{nyc_bbox_wkt}'),
4326))
+LIMIT 5;
+""").show()
+```
+
+
┌─────────────────────────┬────────────────────┬────────────┬────────────┬─────────────────────────┐
+ │ id ┆ height ┆ num_floors ┆ roof_shape ┆
centroid │
+ │ utf8view ┆ float64 ┆ int32 ┆ utf8view ┆
wkb <ogc:crs84> │
+
╞═════════════════════════╪════════════════════╪════════════╪════════════╪═════════════════════════╡
+ │ 1b9040c2-2e79-4f56-aba… ┆ 22.4 ┆ ┆ ┆
POINT(-74.230407502993… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 1b5e1cd2-d697-489e-892… ┆ 21.5 ┆ ┆ ┆
POINT(-74.231451103592… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ c1afdf78-bf84-4b8f-ae1… ┆ 20.9 ┆ ┆ ┆
POINT(-74.232593032240… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 88f36399-b09f-491b-bb6… ┆ 24.5 ┆ ┆ ┆
POINT(-74.231878209597… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ df37a283-f5bd-4822-a05… ┆ 24.154542922973633 ┆ ┆ ┆
POINT(-74.241910239840… │
+
└─────────────────────────┴────────────────────┴────────────┴────────────┴─────────────────────────┘
+
+
+## Overture divisions table
+
+
+```python
+df = sd.read_parquet(
+
"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=divisions/type=division_area/"
+)
+```
+
+
+```python
+# take a look at a few rows of data
+df.show(10)
+```
+
+
┌────────────────┬────────────────┬────────────────┬───┬────────────────┬──────────┬───────────────┐
+ │ id ┆ geometry ┆ bbox ┆ … ┆ is_territorial ┆
region ┆ division_id │
+ │ utf8view ┆ wkb_view <ogc… ┆ struct(xmin f… ┆ ┆ boolean ┆
utf8view ┆ utf8view │
+
╞════════════════╪════════════════╪════════════════╪═══╪════════════════╪══════════╪═══════════════╡
+ │ 61912ffd-060b… ┆ POLYGON((23.3… ┆ {xmin: 22.735… ┆ … ┆ true ┆
ZA-EC ┆ 2711d6ca-ac1… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 7647b992-e0d6… ┆ POLYGON((26.5… ┆ {xmin: 26.521… ┆ … ┆ true ┆
ZA-EC ┆ 0e8a08eb-6f2… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 4058785b-82c9… ┆ MULTIPOLYGON(… ┆ {xmin: 22.735… ┆ … ┆ false ┆
ZA-EC ┆ 2711d6ca-ac1… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ cd9389b7-3451… ┆ POLYGON((26.5… ┆ {xmin: 26.373… ┆ … ┆ true ┆
ZA-EC ┆ 9d59ea5e-408… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ a60ae908-e6fa… ┆ POLYGON((26.6… ┆ {xmin: 26.541… ┆ … ┆ true ┆
ZA-EC ┆ f49ef082-3c2… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ fd070cbb-4aaa… ┆ POLYGON((26.1… ┆ {xmin: 26.084… ┆ … ┆ true ┆
ZA-EC ┆ 513b5a9c-c29… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 70479601-dc12… ┆ POLYGON((26.4… ┆ {xmin: 26.222… ┆ … ┆ true ┆
ZA-EC ┆ 2ade34e5-955… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 3a294b23-f674… ┆ POLYGON((24.5… ┆ {xmin: 24.503… ┆ … ┆ true ┆
ZA-EC ┆ 4f63d19f-2ca… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 48b3e344-10a4… ┆ POLYGON((26.6… ┆ {xmin: 26.557… ┆ … ┆ true ┆
ZA-EC ┆ 20d890bb-1a4… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ 92e71cf6-fa94… ┆ POLYGON((25.8… ┆ {xmin: 25.799… ┆ … ┆ true ┆
ZA-EC ┆ 4202ec06-188… │
+
└────────────────┴────────────────┴────────────────┴───┴────────────────┴──────────┴───────────────┘
+
+
+
+```python
+df.to_view("division_area")
+```
+
+
+```python
+sd.sql("""
+SELECT
+ COUNT(*)
+FROM division_area
+""").show()
+```
+
+ ┌──────────┐
+ │ count(*) │
+ │ int64 │
+ ╞══════════╡
+ │ 1035749 │
+ └──────────┘
+
+
+
+```python
+df.schema
+```
+
+
+
+
+ SedonaSchema with 13 fields:
+ id: Utf8View
+ geometry: wkb_view <ogc:crs84>
+ bbox: Struct(xmin Float32, xmax Float32, ymin Float32, ymax Float32)
+ country: Utf8View
+ version: Int32
+ sources: List(Field { name: "element", data_type: Struct([Field { name:
"property", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }, Field { name: "dataset", data_type: Utf8, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name:
"record_id", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }, Field { name: "update_time", data_type: Utf8, nullable:
true, dict_id: 0, dict_is_ordered: fals [...]
+ subtype: Utf8View
+ class: Utf8View
+ names: Struct(primary Utf8, common Map(Field { name: "key_value",
data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value",
data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata:
{} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
false), rules List(Field { name: "element", data_type: Struct([Field { name:
"variant", data_type: Utf8, nullable: [...]
+ is_land: Boolean
+ is_territorial: Boolean
+ region: Utf8View
+ division_id: Utf8View
+
+
+
+
+```python
+# get all the divisions in Nova Scotia and save them in memory with
to_memtable()
+nova_scotia_bbox_wkt = (
+ "POLYGON((-66.5 43.4, -66.5 47.1, -59.8 47.1, -59.8 43.4, -66.5 43.4))"
+)
+ns = sd.sql(f"""
+SELECT
+ country, region, names, geometry
+FROM division_area
+WHERE
+ ST_Intersects(geometry,
ST_SetSRID(ST_GeomFromText('{nova_scotia_bbox_wkt}'), 4326))
+""").to_memtable()
+```
+
+
+```python
+ns.to_view("ns_divisions")
+```
+
+
+```python
+df = sd.sql("""
+SELECT UNNEST(names), geometry
+FROM ns_divisions
+WHERE region = 'CA-NS'
+""")
+```
+
+
+```python
+%%time
+# this executes quickly because the Nova Scotia data was persisted in memory
with to_memtable()
+df.show(2)
+```
+
+
┌────────────────────────┬────────────────────────┬────────────────────────┬───────────────────────┐
+ │ __unnest_placeholder(n ┆ __unnest_placeholder(n ┆ __unnest_placeholder(n
┆ geometry │
+ │ s_divisions.names).pr… ┆ s_divisions.names).co… ┆ s_divisions.names).ru…
┆ wkb_view <ogc:crs84> │
+
╞════════════════════════╪════════════════════════╪════════════════════════╪═══════════════════════╡
+ │ Seal Island ┆ ┆
┆ POLYGON((-66.0528452… │
+
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Mud Island ┆ ┆
┆ POLYGON((-66.0222822… │
+
└────────────────────────┴────────────────────────┴────────────────────────┴───────────────────────┘
+ CPU times: user 8.75 ms, sys: 2.41 ms, total: 11.2 ms
+ Wall time: 8.47 ms
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
new file mode 100644
index 0000000..9ab922d
--- /dev/null
+++ b/docs/programming-guide.md
@@ -0,0 +1,227 @@
+# SedonaDB Guide
+
+This page explains how to process vector data with SedonaDB.
+
+You will learn how to create SedonaDB DataFrames, run spatial queries, and
perform I/O operations with various types of files.
+
+Let's start by establishing a SedonaDB connection.
+
+## Establish SedonaDB connection
+
+Here's how to create the SedonaDB connection:
+
+
+```python
+import sedona.db
+
+sd = sedona.db.connect()
+```
+
+Now, let's see how to create SedonaDB dataframes.
+
+## Create SedonaDB DataFrame
+
+**Manually creating SedonaDB DataFrame**
+
+Here's how to manually create a SedonaDB DataFrame:
+
+
+```python
+df = sd.sql("""
+SELECT * FROM (VALUES
+ ('one', ST_GeomFromWkt('POINT(1 2)')),
+ ('two', ST_GeomFromWkt('POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, -73.9
40.7, -74.0 40.7))')),
+ ('three', ST_GeomFromWkt('LINESTRING(-74.0060 40.7128, -73.9352 40.7306,
-73.8561 40.8484)')))
+AS t(val, point)""")
+```
+
+Check the type of the DataFrame.
+
+
+```python
+type(df)
+```
+
+
+
+
+ sedonadb.dataframe.DataFrame
+
+
+
+**Create SedonaDB DataFrame from files in S3**
+
+For most production applications, you will create SedonaDB DataFrames by
reading data from a file. Let's see how to read GeoParquet files in AWS S3
into a SedonaDB DataFrame.
+
+
+```python
+sd.read_parquet(
+
"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=divisions/type=division_area/",
+ options={"aws.skip_signature": True, "aws.region": "us-west-2"},
+).to_view("division_area")
+```
+
+Now, let's run some spatial queries.
+
+**Read from GeoPandas DataFrame**
+
+This section shows how to convert a GeoPandas DataFrame into a SedonaDB
DataFrame.
+
+Start by reading a FlatGeoBuf file into a GeoPandas DataFrame:
+
+
+```python
+import geopandas as gpd
+
+path =
"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb"
+gdf = gpd.read_file(path)
+```
+
+Now convert the GeoPandas DataFrame to a SedonaDB DataFrame and view three
rows of content:
+
+
+```python
+df = sd.create_data_frame(gdf)
+df.show(3)
+```
+
+ ┌──────────────┬──────────────────────────────┐
+ │ name ┆ geometry │
+ │ utf8 ┆ geometry │
+ ╞══════════════╪══════════════════════════════╡
+ │ Vatican City ┆ POINT(12.4533865 41.9032822) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ San Marino ┆ POINT(12.4417702 43.9360958) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Vaduz ┆ POINT(9.5166695 47.1337238) │
+ └──────────────┴──────────────────────────────┘
+
+
+## Spatial queries
+
+Let's see how to run spatial operations like filtering, joins, and clustering
algorithms.
+
+**Spatial filtering**
+
+Let's run a spatial filtering operation to fetch all the objects in the
following polygon:
+
+
+```python
+nova_scotia_bbox_wkt = (
+ "POLYGON((-66.5 43.4, -66.5 47.1, -59.8 47.1, -59.8 43.4, -66.5 43.4))"
+)
+
+ns = sd.sql(f"""
+SELECT country, region, geometry
+FROM division_area
+WHERE ST_Intersects(geometry,
ST_SetSRID(ST_GeomFromText('{nova_scotia_bbox_wkt}'), 4326))
+""")
+
+ns.show(3)
+```
+
+
┌──────────┬──────────┬────────────────────────────────────────────────────────────────────────────┐
+ │ country ┆ region ┆ geometry
│
+ │ utf8view ┆ utf8view ┆ geometry
│
+
╞══════════╪══════════╪════════════════════════════════════════════════════════════════════════════╡
+ │ CA ┆ CA-NS ┆ POLYGON((-66.0528452 43.4531336,-66.0883401
43.3978188,-65.9647654 43.361… │
+
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ CA ┆ CA-NS ┆ POLYGON((-66.0222822 43.5166842,-66.0252286
43.5100071,-66.0528452 43.453… │
+
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ CA ┆ CA-NS ┆ POLYGON((-65.7451389 43.5336263,-65.7450818
43.5347004,-65.7449545 43.535… │
+
└──────────┴──────────┴────────────────────────────────────────────────────────────────────────────┘
+
+
+You can see it only includes the divisions in the Nova Scotia area. Skip to
the visualization section to see how this data can be graphed on a map.
+
+**K-nearest neighbors (KNN) joins**
+
+Create `restaurants` and `customers` tables so we can demonstrate the KNN join
functionality.
+
+
+```python
+df = sd.sql("""
+SELECT name, ST_Point(lng, lat) AS location
+FROM (VALUES
+ (101, -74.0, 40.7, 'Pizza Palace'),
+ (102, -73.99, 40.69, 'Burger Barn'),
+ (103, -74.02, 40.72, 'Taco Town'),
+ (104, -73.98, 40.75, 'Sushi Spot'),
+ (105, -74.05, 40.68, 'Deli Direct')
+) AS t(id, lng, lat, name)
+""")
+sd.sql("drop view if exists restaurants")
+df.to_view("restaurants")
+
+df = sd.sql("""
+SELECT name, ST_Point(lng, lat) AS location
+FROM (VALUES
+ (1, -74.0, 40.7, 'Alice'),
+ (2, -73.9, 40.8, 'Bob'),
+ (3, -74.1, 40.6, 'Carol')
+) AS t(id, lng, lat, name)
+""")
+sd.sql("drop view if exists customers")
+df.to_view("customers")
+```
+
+
+```python
+df.show()
+```
+
+ ┌───────┬───────────────────┐
+ │ name ┆ location │
+ │ utf8 ┆ geometry │
+ ╞═══════╪═══════════════════╡
+ │ Alice ┆ POINT(-74 40.7) │
+ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Bob ┆ POINT(-73.9 40.8) │
+ ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Carol ┆ POINT(-74.1 40.6) │
+ └───────┴───────────────────┘
+
+
+Perform a KNN join to identify the two restaurants that are nearest to each
customer:
+
+
+```python
+sd.sql("""
+SELECT
+ c.name AS customer,
+ r.name AS restaurant
+FROM customers c, restaurants r
+WHERE ST_KNN(c.location, r.location, 2, false)
+ORDER BY c.name, r.name;
+""").show()
+```
+
+ ┌──────────┬──────────────┐
+ │ customer ┆ restaurant │
+ │ utf8 ┆ utf8 │
+ ╞══════════╪══════════════╡
+ │ Alice ┆ Burger Barn │
+ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Alice ┆ Pizza Palace │
+ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Bob ┆ Pizza Palace │
+ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Bob ┆ Sushi Spot │
+ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Carol ┆ Deli Direct │
+ ├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Carol ┆ Pizza Palace │
+ └──────────┴──────────────┘
+
+
+Notice how each customer has two rows - one for each of the two closest
restaurants.
+
+## GeoParquet support
+
+You can also read GeoParquet files with SedonaDB with `read_parquet()`
+
+```python
+df = sd.read_parquet("DATA_FILE.parquet")
+```
+
+Once you read the file, you can easily expose it as a view and query it with
spatial SQL, as we demonstrated in the example above.
diff --git a/docs/reference/python.md b/docs/reference/python.md
index 0ff1552..b1b6cc4 100644
--- a/docs/reference/python.md
+++ b/docs/reference/python.md
@@ -21,3 +21,7 @@
::: sedonadb.context
::: sedonadb.dataframe
+
+::: sedonadb.testing
+
+::: sedonadb.dbapi
diff --git a/docs/requirements.txt b/docs/requirements.txt
index f6a1590..ccd5da6 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1,11 +1,9 @@
-jupyter
mike
mkdocs-git-revision-date-localized-plugin
mkdocs-glightbox
-mkdocs-jupyter
mkdocs-macros-plugin
mkdocs-material
mkdocstrings[python]
nbconvert
-ruff
pyproj
+ruff
diff --git a/mkdocs.yml b/mkdocs.yml
index 12607f4..621f1bc 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,3 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
site_name: SedonaDB
site_description: "Documentation for Apache SedonaDB"
site_url: https://sedona.apache.org/sedonadb/
@@ -5,7 +22,9 @@ nav:
- SedonaDB: index.md
- SedonaDB Guides:
- Python Quickstart: quickstart-python.md
- - SedonaDB Guide: programming-guide.ipynb
+ - SedonaDB Guide: programming-guide.md
+ - Working with GeoPandas: geopandas-interop.md
+ - Working with Overture: overture-examples.md
- Development: development.md
- SedonaDB Reference:
- Python:
@@ -20,6 +39,7 @@ nav:
- Sedona Homepage: "https://sedona.apache.org/latest/"
repo_url: https://github.com/apache/sedona-db
+edit_uri: https://github.com/apache/sedona-db/blob/main/docs/
repo_name: apache/sedona-db
theme:
name: material
@@ -96,7 +116,6 @@ plugins:
- macros
- git-revision-date-localized:
type: datetime
- - mkdocs-jupyter
- mike:
version_selector: true
canonical_version: 'latest'
@@ -121,12 +140,10 @@ plugins:
- mkdocstrings:
handlers:
python:
- # 'inventories' is a direct setting for the Python handler
inventories:
- https://docs.python.org/3/objects.inv
- https://geopandas.org/en/stable/objects.inv
- https://pandas.pydata.org/docs/objects.inv
- # All display and path options go under a SINGLE 'options' block
options:
docstring_section_style: list
docstring_style: google
diff --git a/python/sedonadb/python/sedonadb/dbapi.py
b/python/sedonadb/python/sedonadb/dbapi.py
index fd43171..968b1b0 100644
--- a/python/sedonadb/python/sedonadb/dbapi.py
+++ b/python/sedonadb/python/sedonadb/dbapi.py
@@ -14,13 +14,16 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
+
+from typing import Mapping, Any
+
import adbc_driver_manager.dbapi
import sedonadb.adbc
from sedonadb.utility import sedona # noqa: F401
-def connect(**kwargs) -> "Connection":
+def connect(**kwargs: Mapping[str, Any]) -> "Connection":
"""Connect to Sedona via Python DBAPI
Creates a DBAPI-compatible connection as a thin wrapper around the