paleolimbot commented on code in PR #238:
URL: https://github.com/apache/sedona-db/pull/238#discussion_r2458604663
##########
docs/delta-lake.ipynb:
##########
@@ -65,13 +65,11 @@
"outputs": [],
"source": [
"countries.to_view(\"countries\")\n",
- "df = sd.sql(\"select name, continent, ST_AsText(geometry) as geometry_wkt
from countries\")\n",
+ "df = sd.sql(\n",
+ " \"select name, continent, ST_AsText(geometry) as geometry_wkt from
countries\"\n",
+ ")\n",
"table_path = \"/tmp/delta_with_wkt\"\n",
- "write_deltalake(\n",
- " table_path,\n",
- " df.to_pandas(),\n",
- " mode=\"overwrite\"\n",
- ")"
+ "write_deltalake(table_path, df.to_pandas(), mode=\"overwrite\")"
Review Comment:
I think that `write_deltalake()` supports ArrowArrayStreamExportable input
(which our `df` is!), so you can just do:
```python
write_deltalake(table_path, df, ...)
```
https://delta-io.github.io/delta-rs/api/delta_writer/#write-to-delta-tables
##########
docs/delta-lake.ipynb:
##########
@@ -0,0 +1,242 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "3425268d-6430-4e52-9019-969d61ef5458",
+ "metadata": {},
+ "source": [
+ "# SedonaDB + Delta Lake\n",
+ "\n",
+ "This page shows how to read and write Delta Lake tables with SedonaDB.\n",
+ "\n",
+ "Make sure you run `pip install deltalake` to run the cells in this
notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "83f5ca25-8059-4624-bd00-e44cd172d9c2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from deltalake import write_deltalake, DeltaTable\n",
+ "import sedona.db\n",
+ "\n",
+ "sd = sedona.db.connect()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1a90b1bf-99f9-4f50-987b-f1f30bf9988a",
+ "metadata": {},
+ "source": [
+ "Read in a GeoParquet dataset into a SedonaDB DataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "a37f3a30-3267-4bfc-8e5b-e234053927af",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "countries = sd.read_parquet(\n",
+ "
\"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries_geo.parquet\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1f82b047-8182-4120-82d7-945ff38ecbca",
+ "metadata": {},
+ "source": [
+ "## Create a Delta Lake table\n",
+ "\n",
+ "Now write the DataFrame to a Delta Lake table. Notice that the geometry
column must be converted to Well-Known Text (WKT) before writing to the Delta
table.\n",
+ "\n",
+ "Delta Lake does not support geometry columns."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "35bdc296-d9ef-4ea2-9ba5-dac1cfd115ef",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "countries.to_view(\"countries\")\n",
+ "df = sd.sql(\n",
+ " \"select name, continent, ST_AsText(geometry) as geometry_wkt from
countries\"\n",
+ ")\n",
+ "table_path = \"/tmp/delta_with_wkt\"\n",
+ "write_deltalake(table_path, df.to_pandas(), mode=\"overwrite\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "974a5558-fb18-49b7-a304-fa93bdd363e8",
+ "metadata": {},
+ "source": [
+ "## Read Delta table into SedonaDB\n",
+ "\n",
+ "Now read the Delta table back into a SedonaDB DataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "c15d4605-483a-4041-973a-547098bdeef4",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+
"┌─────────────────────────────┬───────────────┬────────────────────────────────────────────────────┐\n",
+ "│ name ┆ continent ┆
geometry_wkt │\n",
+ "│ utf8 ┆ utf8 ┆
utf8 │\n",
+
"╞═════════════════════════════╪═══════════════╪════════════════════════════════════════════════════╡\n",
+ "│ Fiji ┆ Oceania ┆ MULTIPOLYGON(((180
-16.067132663642447,180 -16.55… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ United Republic of Tanzania ┆ Africa ┆
POLYGON((33.90371119710453 -0.9500000000000001,34… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Western Sahara ┆ Africa ┆
POLYGON((-8.665589565454809 27.656425889592356,-8… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Canada ┆ North America ┆
MULTIPOLYGON(((-122.84000000000003 49.00000000000… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ United States of America ┆ North America ┆
MULTIPOLYGON(((-122.84000000000003 49.00000000000… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Kazakhstan ┆ Asia ┆
POLYGON((87.35997033076265 49.21498078062912,86.5… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Uzbekistan ┆ Asia ┆
POLYGON((55.96819135928291 41.30864166926936,55.9… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Papua New Guinea ┆ Oceania ┆
MULTIPOLYGON(((141.00021040259185 -2.600151055515… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Indonesia ┆ Asia ┆
MULTIPOLYGON(((141.00021040259185 -2.600151055515… │\n",
+
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Argentina ┆ South America ┆
MULTIPOLYGON(((-68.63401022758323 -52.63637045887… │\n",
+
"└─────────────────────────────┴───────────────┴────────────────────────────────────────────────────┘\n"
+ ]
+ }
+ ],
+ "source": [
+ "dt = DeltaTable(table_path)\n",
+ "arrow_table = dt.to_pyarrow_table()\n",
Review Comment:
Probably a user will want to select columns or filter with an expression
here? (One of the cool things we could do here if we integrated a delta lake
table provider would be to insert this query automatically based on the
information DataFusion gives us).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]