This is an automated email from the ASF dual-hosted git repository.
jiayu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/sedona-db.git
The following commit(s) were added to refs/heads/main by this push:
new ab9fcbb [DOCS] Fit and finish fixes (#110)
ab9fcbb is described below
commit ab9fcbba50370bf829e7ef47af50e1a19745bbba
Author: Kelly-Ann Dolor <[email protected]>
AuthorDate: Fri Sep 19 00:13:30 2025 -0700
[DOCS] Fit and finish fixes (#110)
Co-authored-by: Jia Yu <[email protected]>
---
README.md | 6 +-
docs/{development.md => contributors-guide.md} | 138 ++++++++++++++++----
docs/index.md | 47 ++++---
docs/programming-guide.ipynb | 44 +++----
docs/programming-guide.md | 26 ++--
docs/quickstart-python.ipynb | 2 +-
docs/reference/read-parquet-files.md | 71 -----------
docs/stylesheets/extra.css | 11 ++
docs/working-with-parquet-files.ipynb | 166 +++++++++++++++++++++++++
docs/working-with-parquet-files.md | 116 +++++++++++++++++
mkdocs.yml | 11 +-
11 files changed, 463 insertions(+), 175 deletions(-)
diff --git a/README.md b/README.md
index dfb7e78..dde5ab0 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,11 @@ SedonaDB only runs on a single machine, so it’s perfect for
processing smaller
## Install
-You can install Python SedonaDB with `pip install apache-sedona[db]`.
+You can install Python SedonaDB with PyPI:
+
+```sh
+pip install "apache-sedona[db]"
+```
## Overture buildings example
diff --git a/docs/development.md b/docs/contributors-guide.md
similarity index 51%
rename from docs/development.md
rename to docs/contributors-guide.md
index 58f7178..2183c65 100644
--- a/docs/development.md
+++ b/docs/contributors-guide.md
@@ -17,14 +17,66 @@
under the License.
-->
-# Development
+# Contributors Guide
+
+This guide details how to set up your development environment as a SedonaDB
Contributor.
+
+## Fork and clone the repository
+
+Your first step is to create a personal copy of the repository and connect it
to the main project.
+
+1. Fork the repository
+
+ * Navigate to the official [Apache SedonaDB GitHub
repository](https://github.com/apache/sedona-db).
+ * Click the **Fork** button in the top-right corner. This creates a
complete copy of the project in your own GitHub account.
+
+1. Clone your fork
+
+ * Next, clone your newly created fork to your local machine. This
command downloads the repository into a new folder named `sedona-db`.
+ * Replace `YourUsername` with your actual GitHub username.
+
+ ```shell
+ git clone https://github.com/YourUsername/sedona-db.git
+ cd sedona-db
+ ```
+
+1. Configure the remotes
+
+ * Your local repository needs to know where the original project is so
you can pull in updates. You'll add a remote link, traditionally named
**`upstream`**, to the main Apache SedonaDB repository.
+ * Your fork is automatically configured as the **`origin`** remote.
+
+ ```shell
+ # Add the main repository as the "upstream" remote
+ git remote add upstream https://github.com/apache/sedona-db.git
+ ```
+
+1. Verify the configuration
+
+ * Run the following command to verify that you have two remotes
configured correctly: `origin` (your fork) and `upstream` (the main repository).
+
+ ```shell
+ git remote -v
+ ```
+
+ * The output should look like this:
+
+ ```shell
+ origin https://github.com/YourUsername/sedona-db.git (fetch)
+ origin https://github.com/YourUsername/sedona-db.git (push)
+ upstream https://github.com/apache/sedona-db.git (fetch)
+ upstream https://github.com/apache/sedona-db.git (push)
+ ```
## Rust
-SedonaDB is written and Rust and is a standard `cargo` workspace. You can
-install a recent version of the Rust compiler and cargo from
-[rustup.rs](https://rustup.rs/) and run tests using `cargo test`. A local
-development version of the CLI can be run with `cargo run --bin sedona-cli`.
+SedonaDB is written in Rust and is a standard `cargo` workspace.
+
+You can install a recent version of the Rust compiler and cargo from
+[rustup.rs](https://rustup.rs/) and run tests using `cargo test`.
+
+A local development version of the CLI can be run with `cargo run --bin
sedona-cli`.
+
+### Test data setup
Some tests require submodules that contain test data or pinned versions of
external dependencies. These submodules can be initialized with:
@@ -40,16 +92,26 @@ Additionally, some of the data required in the tests can be
downloaded by runnin
python submodules/download-assets.py
```
+### System dependencies
+
Some crates wrap external native libraries and require system dependencies
-to build. At this time the only crate that requires this is the
sedona-s2geography
-crate, which requires [CMake](https://cmake.org),
-[Abseil](https://github.com/abseil/abseil-cpp) and OpenSSL. These can be
installed
-on MacOS with [Homebrew](https://brew.sh):
+to build.
+
+!!!note "`sedona-s2geography`"
+ At this time, the only crate that requires this is the `sedona-s2geography`
+ crate, which requires [CMake](https://cmake.org),
+ [Abseil](https://github.com/abseil/abseil-cpp) and OpenSSL.
+
+#### macOS
+
+These can be installed on macOS with [Homebrew](https://brew.sh):
```shell
brew install abseil openssl cmake geos
```
+#### Linux and Windows
+
On Linux and Windows, it is recommended to use
[vcpkg](https://github.com/microsoft/vcpkg)
to provide external dependencies. This can be done by setting the
`CMAKE_TOOLCHAIN_FILE`
environment variable:
@@ -58,7 +120,9 @@ environment variable:
export CMAKE_TOOLCHAIN_FILE=/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake
```
-When using VSCode, it may be necessary to set this environment variable in
settings.json
+#### Visual Studio Code (VSCode) Configuration
+
+When using VSCode, it may be necessary to set this environment variable in
`settings.json`
such that it can be found by rust-analyzer when running build/run tasks:
```json
@@ -75,8 +139,9 @@ such that it can be found by rust-analyzer when running
build/run tasks:
## Python
Python bindings to SedonaDB are built with the
[Maturin](https://www.maturin.rs) build
-backend. Installing a development version of the main Python bindings the
first time
-can be done with:
+backend.
+
+To install a development version of the main Python bindings for the first
time, run the following commands:
```shell
cd python/sedonadb
@@ -92,12 +157,16 @@ maturin develop
## Debugging
+### Rust
+
Debugging Rust code is most easily done by writing or finding a test that
triggers
the desired behavior and running it using the *Debug* selection in
[VSCode](https://code.visualstudio.com/) with the
[rust-analyzer](https://marketplace.visualstudio.com/items?itemName=rust-lang.rust-analyzer)
-extension. Rust code can also debugged using the CLI by finding the `main()`
function in
-sedona-cli and choosing the *Debug* run option.
+extension. Rust code can also be debugged using the CLI by finding the
`main()` function in
+`sedona-cli` and choosing the *Debug* run option.
+
+### Python, C, and C++
Installation of Python bindings with `maturin develop` ensures a
debug-friendly build for
debugging Rust, Python, or C/C++ code. Python code can be debugged using
breakpoints in
@@ -114,7 +183,9 @@ In general, there is at least one benchmark for every
implementation of a functi
and a few other benchmarks for low-level iteration where work was done to
optimize
specific cases.
-Briefly, benchmarks for a specific crate can be run with `cargo bench`:
+### Running benchmarks
+
+Benchmarks for a specific crate can be run with `cargo bench`:
```shell
cd rust/sedona-geo
@@ -129,17 +200,22 @@ to read for a specific crate).
cargo bench -- st_area
```
+### Managing results
+
By default, criterion saves the last run and will report the difference
between the
current benchmark and the last time it was run (although there are options to
-save and load various baselines). A report containing the last run for any
-benchmark that was ever run can be opened with:
+save and load various baselines).
-```shell
-# MacOS
-open target/criterion/report/index.html
-# Ubuntu
-xdg-open target/criterion/report/index.html
-```
+A report of the latest results for all benchmarks can be opened with the
following command:
+
+=== "macOS"
+ ```shell
+ open target/criterion/report/index.html
+ ```
+=== "Ubuntu"
+ ```shell
+ xdg-open target/criterion/report/index.html
+ ```
All previous saved benchmark runs can be cleared with:
@@ -149,6 +225,16 @@ rm -rf target/criterion
## Documentation
-* `mkdocs serve` - Start the live-reloading docs server.
-* `mkdocs build` - Build the documentation site.
-* `mkdocs -h` - Print help message and exit.
+To contribute to the SedonaDB documentation:
+
+1. Clone the repository and create a fork.
+1. Install the Documentation dependencies:
+ ```sh
+ pip install -r docs/requirements.txt
+ ```
+1. Make your changes to the documentation files.
+1. Preview your changes locally using these commands:
+ * `mkdocs serve` - Start the live-reloading docs server.
+ * `mkdocs build` - Build the documentation site.
+ * `mkdocs -h` - Print help message and exit.
+1. Push your changes and open a pull request.
diff --git a/docs/index.md b/docs/index.md
index 45b2119..62a8bfc 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,6 +1,5 @@
---
hide:
- - navigation
title: Introducing SedonaDB
---
@@ -24,30 +23,45 @@ title: Introducing SedonaDB
under the License.
-->
-SedonaDB is a high-performance, dependency-free geospatial compute engine
designed for single-node processing, making it ideal for smaller datasets on
local machines or cloud instances.
+SedonaDB is a single-node analytical database engine with geospatial as the
first-class citizen.
+
+Fast and dependency-free, SedonaDB is ideal for working with smaller datasets
located on local machines or cloud instances.
The initial `0.1` release supports a core set of vector operations, with
comprehensive vector and raster computation capabilities planned for the near
future.
+For distributed workloads, you can still leverage the power of SedonaSpark,
SedonaFlink, or SedonaSnow.
+
## Key features
SedonaDB has several advantages:
* **Exceptional Performance:** Built in Rust to process massive geospatial
datasets with exceptional speed.
* **Unified Geospatial Toolkit:** Access a comprehensive suite of functions
for both vector and raster data in a single, powerful library.
-* **Seamless Ecosystem Integration:** Built on Apache Arrow for smooth
interoperability with popular data science libraries like GeoPandas, DuckDB,
and Polars.
+* **Extensive Ecosystem Integration:** Built on Apache Arrow for smooth
interoperability with popular data science libraries like GeoPandas, DuckDB,
and Polars.
* **Flexible APIs:** Effortlessly switch between Python and SQL interfaces to
match your preferred workflow and skill set.
* **Guaranteed CRS Propagation:** Automatically manages coordinate reference
systems (CRS) to ensure spatial accuracy and prevent common errors.
* **Broad File Format Support:** Work with a wide range of both modern and
legacy geospatial file formats like geoparquet.
* **Highly Extensible:** Easily customize and extend the library's
functionality to meet your project's unique requirements.
-## Run a query in SQL, Python, or Rust
+## Install SedonaDB
+
+Here's how to install SedonaDB with various build tools:
+
+=== "pip"
+
+ ```bash
+ pip install "apache-sedona[db]"
+ ```
+
+=== "R"
-SedonaDB offers a flexible query interface in SQL, Python, or Rust.
+ ```bash
+ install.packages("sedonadb", repos =
"https://community.r-multiverse.org")
+ ```
-Engineered for speed, SedonaDB provides performant geospatial processing on a
single machine. This makes it perfect for the rapid analysis of smaller
datasets, whether you're working locally or on a cloud server. While the
initial release focuses on core vector operations, a full suite of vector and
raster computations is on the roadmap.
+## Run a query in SQL, Python, Rust, or R
-For massive, distributed workloads, you can leverage the power of SedonaSpark,
-SedonaFlink, or SedonaSnow.
+SedonaDB offers a flexible query interface.
=== "SQL"
@@ -58,7 +72,7 @@ SedonaFlink, or SedonaSnow.
=== "Python"
```python
- import seonda.db
+ import sedona.db
sd = sedona.db.connect()
sd.sql("SELECT ST_Point(0, 1) as geom")
@@ -86,21 +100,6 @@ SedonaFlink, or SedonaSnow.
sd_sql("SELECT ST_Point(0, 1) as geom")
```
-## Install SedonaDB
-
-Here's how to install SedonaDB with various build tools:
-
-=== "pip"
-
- ```bash
- pip install "apache-sedona[db]"
- ```
-
-=== "R"
-
- ```bash
- install.packages("sedonadb", repos =
"https://community.r-multiverse.org")
- ```
## Have questions?
diff --git a/docs/programming-guide.ipynb b/docs/programming-guide.ipynb
index 0c3867d..13e36d1 100644
--- a/docs/programming-guide.ipynb
+++ b/docs/programming-guide.ipynb
@@ -24,14 +24,18 @@
" under the License.\n",
"-->\n",
"\n",
- "# SedonaDB Guide\n",
+ "# Working with Vector Data\n",
"\n",
- "This page explains how to process vector data with SedonaDB.\n",
- "\n",
- "You will learn how to create SedonaDB DataFrames, run spatial queries,
and perform I/O operations with various types of files.\n",
- "\n",
- "Let's start by establishing a SedonaDB connection.\n",
+ "Process vector data using SedonaDB. You will learn to create DataFrames,
run spatial queries, and manage file I/O. Let's begin by connecting to
SedonaDB.\n",
"\n",
+ "Let's start by establishing a SedonaDB connection."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "119fcbae",
+ "metadata": {},
+ "source": [
"## Establish SedonaDB connection\n",
"\n",
"Here's how to create the SedonaDB connection:"
@@ -137,7 +141,7 @@
"source": [
"Now, let's run some spatial queries.\n",
"\n",
- "**Read from GeoPandas DataFrame**\n",
+ "### Read from GeoPandas DataFrame\n",
"\n",
"This section shows how to convert a GeoPandas DataFrame into a SedonaDB
DataFrame.\n",
"\n",
@@ -146,7 +150,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": null,
"id": "b81549f2-0f58-49e4-9011-8de6578c2b0e",
"metadata": {},
"outputs": [],
@@ -202,7 +206,7 @@
"\n",
"Let's see how to run spatial operations like filtering, joins, and
clustering algorithms.\n",
"\n",
- "**Spatial filtering**\n",
+ "### Spatial filtering\n",
"\n",
"Let's run a spatial filtering operation to fetch all the objects in the
following polygon:"
]
@@ -249,11 +253,11 @@
"id": "32076e01-d807-40ed-8457-9d8c4244e89f",
"metadata": {},
"source": [
- "You can see it only includes the divisions in the Nova Scotia area. Skip
to the visualization section to see how this data can be graphed on a map.\n",
+ "You can see it only includes the divisions in the Nova Scotia area.\n",
"\n",
- "**K-nearest neighbors (KNN) joins**\n",
+ "### K-nearest neighbors (KNN) joins\n",
"\n",
- "Create `restaurants` and `customers` tables so we can demonstrate the KNN
join functionality."
+ "Create `restaurants` and `customers` views so we can demonstrate the KNN
join functionality."
]
},
{
@@ -370,22 +374,6 @@
"source": [
"Notice how each customer has two rows - one for each of the two closest
restaurants."
]
- },
- {
- "cell_type": "markdown",
- "id": "3cb1e53b",
- "metadata": {},
- "source": [
- "## GeoParquet support\n",
- "\n",
- "You can also read GeoParquet files with SedonaDB with `read_parquet()`\n",
- "\n",
- "```python\n",
- "df = sd.read_parquet(\"DATA_FILE.parquet\")\n",
- "```\n",
- "\n",
- "Once you read the file, you can easily expose it as a view and query it
with spatial SQL, as we demonstrated in the example above.\n"
- ]
}
],
"metadata": {
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 493603a..7da3c5f 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -17,11 +17,9 @@
under the License.
-->
-# SedonaDB Guide
+# Process Vector Data with SedonaDB
-This page explains how to process vector data with SedonaDB.
-
-You will learn how to create SedonaDB DataFrames, run spatial queries, and
perform I/O operations with various types of files.
+Process vector data using SedonaDB. You will learn to create DataFrames, run
spatial queries, and manage file I/O. Let's begin by connecting to SedonaDB.
Let's start by establishing a SedonaDB connection.
@@ -82,7 +80,7 @@ sd.read_parquet(
Now, let's run some spatial queries.
-**Read from GeoPandas DataFrame**
+### Read from GeoPandas DataFrame
This section shows how to convert a GeoPandas DataFrame into a SedonaDB
DataFrame.
@@ -120,7 +118,7 @@ df.show(3)
Let's see how to run spatial operations like filtering, joins, and clustering
algorithms.
-**Spatial filtering**
+### Spatial filtering
Let's run a spatial filtering operation to fetch all the objects in the
following polygon:
@@ -151,11 +149,11 @@ ns.show(3)
└──────────┴──────────┴────────────────────────────────────────────────────────────────────────────┘
-You can see it only includes the divisions in the Nova Scotia area. Skip to
the visualization section to see how this data can be graphed on a map.
+You can see it only includes the divisions in the Nova Scotia area.
-**K-nearest neighbors (KNN) joins**
+### K-nearest neighbors (KNN) joins
-Create `restaurants` and `customers` tables so we can demonstrate the KNN join
functionality.
+Create `restaurants` and `customers` views so we can demonstrate the KNN join
functionality.
```python
@@ -234,13 +232,3 @@ ORDER BY c.name, r.name;
Notice how each customer has two rows - one for each of the two closest
restaurants.
-
-## GeoParquet support
-
-You can also read GeoParquet files with SedonaDB with `read_parquet()`
-
-```python
-df = sd.read_parquet("DATA_FILE.parquet")
-```
-
-Once you read the file, you can easily expose it as a view and query it with
spatial SQL, as we demonstrated in the example above.
diff --git a/docs/quickstart-python.ipynb b/docs/quickstart-python.ipynb
index 56dcc17..3558e22 100644
--- a/docs/quickstart-python.ipynb
+++ b/docs/quickstart-python.ipynb
@@ -250,7 +250,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": null,
"id": "6dd816c7-fd3f-4358-b628-ef5e6940c95c",
"metadata": {},
"outputs": [],
diff --git a/docs/reference/read-parquet-files.md
b/docs/reference/read-parquet-files.md
deleted file mode 100644
index 6dc4836..0000000
--- a/docs/reference/read-parquet-files.md
+++ /dev/null
@@ -1,71 +0,0 @@
-
-<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-
-# Reading Parquet Files
-
-To read a Parquet file, you must use the dedicated `sd.read_parquet()` method.
You cannot query a file path directly within the `sd.sql()` `FROM` clause.
-
-The `sd.sql()` function is designed to query tables that have already been
registered in the session. When you pass a path like `'s3://...'` to `FROM`,
the SQL engine searches for a registered table with that literal name and fails
when it's not found, producing a `table not found` error.
-
-## Usage
-
-The correct process is a two-step approach:
-
-1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.
-1. **Register** the data frame view with `to_view()`.
-1. **Query** the view using `sd.sql()`.
-
-```python linenums="1" title="Read a parquet file with SedonaDB"
-
-import sedona.db
-sd = sedona.db.connect()
-
-df = sd.read_parquet(
- 's3://wherobots-benchmark-prod/SpatialBench_sf=1_format=parquet/'
- 'building/building.parquet'
-)
-
-# Load the Parquet file, which creates a Pandas data frame
-df =
sd.read_parquet('s3://wherobots-benchmark-prod/SpatialBench_sf=1_format=parquet/building/building.parquet')
-
-# Convert the Pandas data frame to a Spark data frame AND
-# register it as a temporary view in a single line.
-spark.createDataFrame(df).to_view("zone")
-
-# Now, query the view using SQL
-sd.sql("SELECT * FROM zone LIMIT 10").show()
-```
-
-### Common Errors
-
-Directly using a file path within `sd.sql()` is a common mistake that will
result in an error.
-
-**Incorrect Code:**
-
-```python
-# This will fail because the SQL engine looks for a table named 's3://...'
-sd.sql("SELECT * FROM
's3://wherobots-benchmark-prod/SpatialBench_sf=1_format=parquet/building/building.parquet'")
-```
-
-**Resulting Error:**
-
-```bash
-sedonadb._lib.SedonaError: Error during planning: table '...s3://...' not found
-```
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
index b18d0b4..b4651e0 100644
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -75,3 +75,14 @@
padding: 0 0.9rem;
font-size: 0.65rem; /* NEW: Adjust font size */
}
+
+/* ==========================================================================
+ Mobile Navigation Styles
+ ==========================================================================
*/
+
+/* This targets the main container of the slide-out navigation on mobile */
+.md-nav--primary .md-nav__title,
+.md-nav__source {
+ background-color: var(--color-red); /* Use your red color */
+ box-shadow: none; /* Optional: removes the shadow */
+}
diff --git a/docs/working-with-parquet-files.ipynb
b/docs/working-with-parquet-files.ipynb
new file mode 100644
index 0000000..40aedaf
--- /dev/null
+++ b/docs/working-with-parquet-files.ipynb
@@ -0,0 +1,166 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Working with Parquet Files\n",
+ "\n",
+ "The easiest way to read a GeoParquet or Parquet file is to use
`sd.read_parquet()`. Alternatively, you can query these files directly by their
path in SQL."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Install SedonaDB\n",
+ "\n",
+ "Use pip to install SedonaDB from the Python Package Index (PyPI)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> **Note**: Before running this notebook on your local machine, you must
have SedonaDB installed in your environment. You can install SedonaDB with the
following command: `pip install \"apache-sedona[db]\"`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Implementation\n",
+ "\n",
+ "A common workflow for working with GeoParquet and/or Parquet files is:\n",
+ "\n",
+ "1. **Load** the Parquet file into a data frame using
`sd.read_parquet()`.\n",
+ "2. **Register** the data frame as a view with `to_view()`.\n",
+ "3. **Query** the view using `sd.sql()`.\n",
+ "4. **Write** your results to a Parquet file with `.to_parquet()` or use
`.to_pandas()` to export your results to a DataFrame or GeoDataFrame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the sedona.db module and connect to SedonaDB\n",
+ "import sedona.db\n",
+ "\n",
+ "sd = sedona.db.connect()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "┌──────────────┬───────────────────────────────┐\n",
+ "│ name ┆ geometry │\n",
+ "│ utf8 ┆ geometry │\n",
+ "╞══════════════╪═══════════════════════════════╡\n",
+ "│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ San Marino ┆ POINT(12.4417702 43.9360958) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Vaduz ┆ POINT(9.5166695 47.1337238) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Lobamba ┆ POINT(31.1999971 -26.4666675) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Luxembourg ┆ POINT(6.1300028 49.6116604) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Palikir ┆ POINT(158.1499743 6.9166437) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Majuro ┆ POINT(171.3800002 7.1030043) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Funafuti ┆ POINT(179.2166471 -8.516652) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Melekeok ┆ POINT(134.6265485 7.4873962) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Bir Lehlou ┆ POINT(-9.6525222 26.1191667) │\n",
+ "└──────────────┴───────────────────────────────┘\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 1. Load the Parquet file\n",
+ "df = sd.read_parquet(\n",
+ "
\"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/\"\n",
+ " \"natural-earth/files/natural-earth_cities_geo.parquet\"\n",
+ ")\n",
+ "\n",
+ "# 2. Register the data frame as a view\n",
+ "df.to_view(\"zone\")\n",
+ "\n",
+ "# 3. Query the view and store the result in a new DataFrame\n",
+ "query_result_df = sd.sql(\"SELECT * FROM zone LIMIT 10\")\n",
+ "query_result_df.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Verifying the written file at 'query_results.parquet'...\n",
+ "┌──────────────┬───────────────────────────────┐\n",
+ "│ name ┆ geometry │\n",
+ "│ utf8 ┆ geometry │\n",
+ "╞══════════════╪═══════════════════════════════╡\n",
+ "│ Vatican City ┆ POINT(12.4533865 41.9032822) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ San Marino ┆ POINT(12.4417702 43.9360958) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Vaduz ┆ POINT(9.5166695 47.1337238) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Lobamba ┆ POINT(31.1999971 -26.4666675) │\n",
+ "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+ "│ Luxembourg ┆ POINT(6.1300028 49.6116604) │\n",
+ "└──────────────┴───────────────────────────────┘\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 4. Write the result to a new Parquet file\n",
+ "output_path = \"query_results.parquet\"\n",
+ "query_result_df.to_parquet(output_path)\n",
+ "\n",
+ "# (Optional) Verify the written file\n",
+ "print(f\"\\nVerifying the written file at '{output_path}'...\")\n",
+ "verified_df = sd.read_parquet(output_path)\n",
+ "verified_df.show(5)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv (3.13.3)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.13.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/working-with-parquet-files.md
b/docs/working-with-parquet-files.md
new file mode 100644
index 0000000..ea28931
--- /dev/null
+++ b/docs/working-with-parquet-files.md
@@ -0,0 +1,116 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Working with Parquet Files
+
+The easiest way to read a GeoParquet or Parquet file is to use
`sd.read_parquet()`. Alternatively, you can query these files directly by their
path in SQL.
+
+## Install SedonaDB
+
+Use pip to install SedonaDB from the Python Package Index (PyPI).
+
+> **Note**: Before running this notebook on your local machine, you must have
SedonaDB installed in your environment. You can install SedonaDB with the
following command: `pip install "apache-sedona[db]"`
+
+## Implementation
+
+A common workflow for working with GeoParquet and/or Parquet files is:
+
+1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.
+2. **Register** the data frame as a view with `to_view()`.
+3. **Query** the view using `sd.sql()`.
+4. **Write** your results to a Parquet file with `.to_parquet()` or use
`.to_pandas()` to export your results to a DataFrame or GeoDataFrame.
+
+
+```python
+# Import the sedona.db module and connect to SedonaDB
+import sedona.db
+
+sd = sedona.db.connect()
+```
+
+
+```python
+# 1. Load the Parquet file
+df = sd.read_parquet(
+ "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/"
+ "natural-earth/files/natural-earth_cities_geo.parquet"
+)
+
+# 2. Register the data frame as a view
+df.to_view("zone")
+
+# 3. Query the view and store the result in a new DataFrame
+query_result_df = sd.sql("SELECT * FROM zone LIMIT 10")
+query_result_df.show()
+```
+
+ ┌──────────────┬───────────────────────────────┐
+ │ name ┆ geometry │
+ │ utf8 ┆ geometry │
+ ╞══════════════╪═══════════════════════════════╡
+ │ Vatican City ┆ POINT(12.4533865 41.9032822) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ San Marino ┆ POINT(12.4417702 43.9360958) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Vaduz ┆ POINT(9.5166695 47.1337238) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Lobamba ┆ POINT(31.1999971 -26.4666675) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Luxembourg ┆ POINT(6.1300028 49.6116604) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Palikir ┆ POINT(158.1499743 6.9166437) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Majuro ┆ POINT(171.3800002 7.1030043) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Funafuti ┆ POINT(179.2166471 -8.516652) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Melekeok ┆ POINT(134.6265485 7.4873962) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Bir Lehlou ┆ POINT(-9.6525222 26.1191667) │
+ └──────────────┴───────────────────────────────┘
+
+
+
+```python
+# 4. Write the result to a new Parquet file
+output_path = "query_results.parquet"
+query_result_df.to_parquet(output_path)
+
+# (Optional) Verify the written file
+print(f"\nVerifying the written file at '{output_path}'...")
+verified_df = sd.read_parquet(output_path)
+verified_df.show(5)
+```
+
+
+ Verifying the written file at 'query_results.parquet'...
+ ┌──────────────┬───────────────────────────────┐
+ │ name ┆ geometry │
+ │ utf8 ┆ geometry │
+ ╞══════════════╪═══════════════════════════════╡
+ │ Vatican City ┆ POINT(12.4533865 41.9032822) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ San Marino ┆ POINT(12.4417702 43.9360958) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Vaduz ┆ POINT(9.5166695 47.1337238) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Lobamba ┆ POINT(31.1999971 -26.4666675) │
+ ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+ │ Luxembourg ┆ POINT(6.1300028 49.6116604) │
+ └──────────────┴───────────────────────────────┘
diff --git a/mkdocs.yml b/mkdocs.yml
index 621f1bc..233ce78 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -20,19 +20,20 @@ site_description: "Documentation for Apache SedonaDB"
site_url: https://sedona.apache.org/sedonadb/
nav:
- SedonaDB: index.md
+ - Python Quickstart: quickstart-python.md
- SedonaDB Guides:
- - Python Quickstart: quickstart-python.md
- - SedonaDB Guide: programming-guide.md
+ - Working with Vector Data: programming-guide.md
- Working with GeoPandas: geopandas-interop.md
- Working with Overture: overture-examples.md
- - Development: development.md
+ - Working with Parquet Files: working-with-parquet-files.md
+ - Contributors Guide: contributors-guide.md
+
- SedonaDB Reference:
- Python:
- Python Functions: reference/python.md
- SQL:
- SQL Functions: reference/sql.md
- Spatial Joins: reference/sql-joins.md
- - Read Parquet Files: reference/read-parquet-files.md
- Blog: "https://sedona.apache.org/latest/blog/"
- Community: "https://sedona.apache.org/latest/community/contact/"
- Apache Software Foundation: "https://sedona.apache.org/latest/asf/asf/"
@@ -50,7 +51,7 @@ theme:
primary: custom
accent: 'green'
favicon: image/sedona_logo_symbol.png
- logo: image/sedona_logo_symbol_white.svg
+ logo: image/sedona_logo_symbol.png
icon:
logo: fontawesome/solid/earth-americas
repo: fontawesome/brands/github