Re: [PR] docs: add sedonadb programming guide [sedona-db]

via GitHub Thu, 11 Sep 2025 20:00:15 -0700


paleolimbot commented on code in PR #64:
URL: https://github.com/apache/sedona-db/pull/64#discussion_r2342817213



##########
docs/programming-guide.ipynb:
##########
@@ -0,0 +1,387 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1932983e-1cd2-41d0-a5eb-0537b3ac3feb",
+   "metadata": {},
+   "source": [
+    "# SedonaDB Guide\n",
+    "\n",
+    "This page explains how to process vector data with SedonaDB.\n",
+    "\n",
+    "You will learn how to create SedonaDB DataFrames, run spatial queries, 
and perform I/O operations with various types of files.\n",
+    "\n",
+    "Let’s start by establishing a SedonaDB connection.\n",
+    "\n",
+    "## Establish SedonaDB connection\n",
+    "\n",
+    "Here’s how to create the SedonaDB connection:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "53c3b7a8-c42a-407a-a454-6ee1e943fbcc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedonadb\n",
+    "\n",
+    "sd = sedonadb.connect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7aeaa60f-2325-418c-8e72-4344bd4a75fe",
+   "metadata": {},
+   "source": [
+    "Now let’s see how to create SedonaDB DataFrames.\n",
+    "\n",
+    "## Create SedonaDB DataFrame\n",
+    "\n",
+    "**Manually creating SedonaDB DataFrame**\n",
+    "\n",
+    "Here’s how to manually create a SedonaDB DataFrame:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "b3377767-d747-407c-92c0-8786c1998131",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = sd.sql(\"\"\"\n",
+    "SELECT * FROM (VALUES\n",
+    "    ('one', ST_GeomFromWkt('POINT(1 2)')),\n",
+    "    ('two', ST_GeomFromWkt('POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, 
-73.9 40.7, -74.0 40.7))')),\n",
+    "    ('three', ST_GeomFromWkt('LINESTRING(-74.0060 40.7128, -73.9352 
40.7306, -73.8561 40.8484)')))\n",
+    "AS t(val, point)\"\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f9e1319-2e7a-4d98-9df0-47a9a73cfff3",
+   "metadata": {},
+   "source": [
+    "Check the type of the DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "e8be30ab-4818-4db8-bae2-83e973ad1b77",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "sedonadb.dataframe.DataFrame"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "type(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8225ed1f-45a4-4915-a582-8ae191ec53ed",
+   "metadata": {},
+   "source": [
+    "**Create SedonaDB DataFrame from files in S3**\n",
+    "\n",
+    "For most production applications, you will create SedonaDB DataFrames by 
reading data from a file.  Let’s see how to read GeoParquet files in AWS S3 
into a SedonaDB DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "151df287-4b2d-433e-9769-c3378df03b1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sd.read_parquet(\n",
+    "    
\"s3://overturemaps-us-west-2/release/2025-08-20.0/theme=divisions/type=division_area/\",\n",
+    "    options={\"aws.skip_signature\": True, \"aws.region\": 
\"us-west-2\"},\n",
+    ").to_view(\"division_area\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "858fcc66-816d-4c71-8875-82b74169eccd",
+   "metadata": {},
+   "source": [
+    "Let’s now run some spatial queries.\n",
+    "\n",
+    "**Read from GeoPandas DataFrame**\n",
+    "\n",
+    "This section shows how to convert a GeoPandas DataFrame into a SedonaDB 
DataFrame.\n",
+    "\n",
+    "Start by reading a FlatGeoBuf file into a GeoPandas DataFrame:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "b81549f2-0f58-49e4-9011-8de6578c2b0e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import geopandas as gpd\n",
+    "\n",
+    "path = 
\"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities.fgb\"\n";,
+    "gdf = gpd.read_file(path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2265f94b-ccb3-4634-8c52-a8799c68c76a",
+   "metadata": {},
+   "source": [
+    "Now convert the GeoPandas DataFrame to a SedonaDB DataFrame and view 
three rows of content:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "0e4819db-bf58-42d7-8b5b-f272d0f19266",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌──────────────┬──────────────────────────────┐\n",
+      "│     name     ┆           geometry           │\n",
+      "│     utf8     ┆           geometry           │\n",
+      "╞══════════════╪══════════════════════════════╡\n",
+      "│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ San Marino   ┆ POINT(12.4417702 43.9360958) │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Vaduz        ┆ POINT(9.5166695 47.1337238)  │\n",
+      "└──────────────┴──────────────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "df = sd.create_data_frame(gdf)\n",
+    "df.show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6890bcc3-f3bd-4c47-bf86-2607bed5e480",
+   "metadata": {},
+   "source": [
+    "## Spatial queries\n",
+    "\n",
+    "Let’s see how to run spatial operations like filtering, joins, and 
clustering algorithms.\n",
+    "\n",
+    "***Spatial filtering***\n",
+    "\n",
+    "Let’s run a spatial filtering operation to fetch all the objects in the 
following polygon:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "8c8a4b48-8c4e-412e-900f-8c0f6f4ccc1d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌──────────┬──────────┬────────────────────────────────────────────────────────────────────────────┐\n",
+      "│  country ┆  region  ┆                                  geometry       
                           │\n",
+      "│ utf8view ┆ utf8view ┆                                  geometry       
                           │\n",
+      
"╞══════════╪══════════╪════════════════════════════════════════════════════════════════════════════╡\n",
+      "│ CA       ┆ CA-NS    ┆ POLYGON((-66.0528452 43.4531336,-66.0883401 
43.3978188,-65.9647654 43.361… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ CA       ┆ CA-NS    ┆ POLYGON((-66.0222822 43.5166842,-66.0252286 
43.5100071,-66.0528452 43.453… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ CA       ┆ CA-NS    ┆ POLYGON((-65.7451389 43.5336263,-65.7450818 
43.5347004,-65.7449545 43.535… │\n",
+      
"└──────────┴──────────┴────────────────────────────────────────────────────────────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "nova_scotia_bbox_wkt = (\n",
+    "    \"POLYGON((-66.5 43.4, -66.5 47.1, -59.8 47.1, -59.8 43.4, -66.5 
43.4))\"\n",
+    ")\n",
+    "\n",
+    "ns = sd.sql(f\"\"\"\n",
+    "SELECT country, region, geometry\n",
+    "FROM division_area\n",
+    "WHERE ST_Intersects(geometry, 
ST_SetSRID(ST_GeomFromText('{nova_scotia_bbox_wkt}'), 4326))\n",
+    "\"\"\")\n",
+    "\n",
+    "ns.show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32076e01-d807-40ed-8457-9d8c4244e89f",
+   "metadata": {},
+   "source": [
+    "You can see it only includes the divisions in the Nova Scotia area.  Skip 
to the visualization section to see how this data can be graphed on a map.\n",
+    "\n",
+    "***K-nearest neighbors (KNN) joins***\n",
+    "\n",
+    "Create `restaurants` and `customers` tables so we can demonstrate the KNN 
join functionality."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "deaa36db-2fee-4ba2-ab79-1dc756cb1655",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = sd.sql(\"\"\"\n",
+    "SELECT name, ST_Point(lng, lat) AS location\n",
+    "FROM (VALUES \n",
+    "    (101, -74.0, 40.7, 'Pizza Palace'),\n",
+    "    (102, -73.99, 40.69, 'Burger Barn'),\n",
+    "    (103, -74.02, 40.72, 'Taco Town'),\n",
+    "    (104, -73.98, 40.75, 'Sushi Spot'),\n",
+    "    (105, -74.05, 40.68, 'Deli Direct')\n",
+    ") AS t(id, lng, lat, name)\n",
+    "\"\"\")\n",
+    "sd.sql(\"drop view if exists restaurants\")\n",
+    "df.to_view(\"restaurants\")\n",
+    "\n",
+    "df = sd.sql(\"\"\"\n",
+    "SELECT name, ST_Point(lng, lat) AS location\n",
+    "FROM (VALUES \n",
+    "    (1, -74.0, 40.7, 'Alice'),\n",
+    "    (2, -73.9, 40.8, 'Bob'),\n",
+    "    (3, -74.1, 40.6, 'Carol')\n",
+    ") AS t(id, lng, lat, name)\n",
+    "\"\"\")\n",
+    "sd.sql(\"drop view if exists customers\")\n",
+    "df.to_view(\"customers\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "e3bc4976-4245-432f-b265-7f6aa13f35b9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌───────┬───────────────────┐\n",
+      "│  name ┆      location     │\n",
+      "│  utf8 ┆      geometry     │\n",
+      "╞═══════╪═══════════════════╡\n",
+      "│ Alice ┆ POINT(-74 40.7)   │\n",
+      "├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Bob   ┆ POINT(-73.9 40.8) │\n",
+      "├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Carol ┆ POINT(-74.1 40.6) │\n",
+      "└───────┴───────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "df.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9df227d6-0972-457a-87e3-5a89802c460f",
+   "metadata": {},
+   "source": [
+    "Perform a KNN join to identify the two restaurants that are nearest to 
each customer:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "05565e15-ee18-431c-8fd2-673291d8d0ee",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌──────────┬──────────────┐\n",
+      "│ customer ┆  restaurant  │\n",
+      "│   utf8   ┆     utf8     │\n",
+      "╞══════════╪══════════════╡\n",
+      "│ Alice    ┆ Burger Barn  │\n",
+      "├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Alice    ┆ Pizza Palace │\n",
+      "├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Bob      ┆ Pizza Palace │\n",
+      "├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Bob      ┆ Sushi Spot   │\n",
+      "├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Carol    ┆ Deli Direct  │\n",
+      "├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ Carol    ┆ Pizza Palace │\n",
+      "└──────────┴──────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT\n",
+    "    c.name AS customer,\n",
+    "    r.name AS restaurant\n",
+    "FROM customers c, restaurants r\n",
+    "WHERE ST_KNN(c.location, r.location, 2, false)\n",
+    "ORDER BY c.name, r.name;\n",
+    "\"\"\").show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e93fe6a-b0a7-4ec0-952c-dde9edcacdc4",
+   "metadata": {},
+   "source": [
+    "Notice how each customer has two rows - one for each of the two closest 
restaurants.\n",
+    "\n",
+    "## Files\n",
+    "\n",
+    "You can read GeoParquet files with SedonaDB, see the following 
example:\n",
+    "\n",
+    "```python\n",
+    "df = sd.read_parquet(\"some_file.parquet\")\n",
+    "```\n",
+    "\n",
+    "Once you read the file, you can easily expose it as a view and query it 
with spatial SQL."

Review Comment:
   This part seems to end a little abruptly? You cover this workflow 
above...maybe remove?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: add sedonadb programming guide [sedona-db]

Reply via email to