(sedona) 06/09: [DOCS] add geoparquet docs page (#1818)

jiayu Fri, 28 Feb 2025 10:09:08 -0800

This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch branch-1.7.0
in repository https://gitbox.apache.org/repos/asf/sedona.git


commit 2320b91f301a13bdba8e03f0e26126598f31cfa5
Author: Matthew Powers <[email protected]>
AuthorDate: Thu Feb 20 17:33:48 2025 -0500

    [DOCS] add geoparquet docs page (#1818)
    
    * add geoparquet docs page
    
    * use linter
    
    * centralize content on geoparquet page
    
    * lint file
    
    ---------
    
    Co-authored-by: Jia Yu <[email protected]>
---
 docs/image/tutorial/files/geoparquet_bbox1.png | Bin 0 -> 39442 bytes
 docs/image/tutorial/files/geoparquet_bbox2.png | Bin 0 -> 53095 bytes
 docs/tutorial/files/geoparquet-sedona-spark.md | 327 +++++++++++++++++++++++++
 docs/tutorial/sql.md                           | 121 +--------
 mkdocs.yml                                     |   1 +
 5 files changed, 330 insertions(+), 119 deletions(-)

diff --git a/docs/image/tutorial/files/geoparquet_bbox1.png 
b/docs/image/tutorial/files/geoparquet_bbox1.png
new file mode 100644
index 0000000000..928abc0b85
Binary files /dev/null and b/docs/image/tutorial/files/geoparquet_bbox1.png 
differ
diff --git a/docs/image/tutorial/files/geoparquet_bbox2.png 
b/docs/image/tutorial/files/geoparquet_bbox2.png
new file mode 100644
index 0000000000..25bfb7f4dd
Binary files /dev/null and b/docs/image/tutorial/files/geoparquet_bbox2.png 
differ
diff --git a/docs/tutorial/files/geoparquet-sedona-spark.md 
b/docs/tutorial/files/geoparquet-sedona-spark.md
new file mode 100644
index 0000000000..28da219474
--- /dev/null
+++ b/docs/tutorial/files/geoparquet-sedona-spark.md
@@ -0,0 +1,327 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Apache Sedona GeoParquet with Spark
+
+This page explains how to build spatial data lakes with Apache Sedona and 
GeoParquet.
+
+GeoParquet offers many advantages for spatial data sets:
+
+* Built-in support for geometry columns
+* GeoParquet's columnar nature allows for column pruning, which speeds up 
queries that use only a subset of columns.
+* Embedded statistics enable readers to skip reading entire chunks of a file 
in some queries.
+* Storing bounding-box metadata (“bbox”) for each row group allows for 
row-group filtering.
+* Columnar file formats are easy to compress.
+* You don’t need to infer or manually specify the schema as you would have to 
with GeoJSON or CSV because GeoParquet includes it in the footer.
+
+Let's see how to write a Sedona DataFrame to GeoParquet files.
+
+## Write Sedona DataFrame to GeoParquet with Spark
+
+Start by creating a Sedona DataFrame:
+
+```python
+df = sedona.createDataFrame([
+    ("a", 'LINESTRING(2.0 5.0,6.0 1.0)'),
+    ("b", 'LINESTRING(7.0 4.0,9.0 2.0)'),
+    ("c", 'LINESTRING(1.0 3.0,3.0 1.0)'),
+], ["id", "geometry"])
+df = df.withColumn("geometry", ST_GeomFromText(col("geometry")))
+```
+
+Now write the DataFrame to GeoParquet files:
+
+```python
+df.write.format("geoparquet").mode("overwrite").save("/tmp/somewhere")
+```
+
+Here are the files created in storage:
+
+```
+somewhere/
+  _SUCCESS
+  part-00000-1c13be9e-6d4c-401e-89d8-739000ad3aba-c000.snappy.parquet
+  part-00003-1c13be9e-6d4c-401e-89d8-739000ad3aba-c000.snappy.parquet
+  part-00007-1c13be9e-6d4c-401e-89d8-739000ad3aba-c000.snappy.parquet
+  part-00011-1c13be9e-6d4c-401e-89d8-739000ad3aba-c000.snappy.parquet
+```
+
+Sedona writes many Parquet files in parallel because that's much faster than 
writing a single file.
+
+## Read GeoParquet files into Sedona DataFrame with Spark
+
+Let's read the GeoParquet files into a Sedona DataFrame:
+
+```python
+df = sedona.read.format("geoparquet").load("/tmp/somewhere")
+df.show(truncate=False)
+```
+
+Here are the results:
+
+```
++---+---------------------+  
+|id |geometry             |
++---+---------------------+
+|a  |LINESTRING (2 5, 6 1)|
+|b  |LINESTRING (7 4, 9 2)|
+|c  |LINESTRING (1 3, 3 1)|
++---+---------------------+
+```
+
+Here's how Sedona executes this query under the hood:
+
+1. It fetches the schema from the footer of a GeoParquet file, so no schema 
inference is needed.
+2. It collects the bbox metadata for each file and sees if any data can be 
skipped.
+3. It runs the query, skipping any columns that are not required.
+
+Sedona queries on GeoParquet files can often be executed faster and more 
reliably than other file formats, like CSV.  CSV files don't store the schema 
in the file footer, don't support column pruning, and don't have row-group 
metadata for data skipping.
+
+## Inspect GeoParquet metadata
+
+Since v`1.5.1`, Sedona provides a Spark SQL data source 
`"geoparquet.metadata"` for inspecting GeoParquet metadata. The resulting 
dataframe contains
+the "geo" metadata for each input file.
+
+=== "Scala"
+
+  ```scala
+  val df = 
sedona.read.format("geoparquet.metadata").load(geoparquetdatalocation1)
+  df.printSchema()
+  ```
+
+=== "Java"
+
+  ```java
+  Dataset<Row> df = 
sedona.read.format("geoparquet.metadata").load(geoparquetdatalocation1)
+  df.printSchema()
+  ```
+
+=== "Python"
+
+  ```python
+  df = sedona.read.format("geoparquet.metadata").load(geoparquetdatalocation1)
+  df.printSchema()
+  ```
+
+The output will be as follows:
+
+```
+root
+ |-- path: string (nullable = true)
+ |-- version: string (nullable = true)
+ |-- primary_column: string (nullable = true)
+ |-- columns: map (nullable = true)
+ |    |-- key: string
+ |    |-- value: struct (valueContainsNull = true)
+ |    |    |-- encoding: string (nullable = true)
+ |    |    |-- geometry_types: array (nullable = true)
+ |    |    |    |-- element: string (containsNull = true)
+ |    |    |-- bbox: array (nullable = true)
+ |    |    |    |-- element: double (containsNull = true)
+ |    |    |-- crs: string (nullable = true)
+```
+
+The GeoParquet footer stores the following data for each column:
+
+* geometry_type: type of geometric object like point or polygon
+* bbox: bounding box of objects in file
+* crs: Coordinate Reference System
+* covering: a struct column of xmin, ymin, xmax, ymax that provides the row 
group statistics for GeoParquet 1.1 files
+
+The rest of the Parquet metadata is stored in the Parquet footer, right where 
it is stored for other Parquet files too.
+
+If the input Parquet file does not have GeoParquet metadata, the values of 
`version`, `primary_column` and `columns` fields of the resulting dataframe 
will be `null`.
+
+`geoparquet.metadata` only supports reading GeoParquet specific metadata. 
Users can use 
[G-Research/spark-extension](https://github.com/G-Research/spark-extension/blob/13109b8e43dfba9272c85896ba5e30cfe280426f/PARQUET.md)
 to read comprehensive metadata of generic Parquet files.
+
+Let’s check out the content of the metadata and the schema:
+
+```
+df.show(truncate=False)
+
++-----------------------------+-------------------------------------------------------------------+
+|path                         |columns                                         
                   |
++-----------------------------+-------------------------------------------------------------------+
+|file:/part-00003-1c13.parquet|{geometry -> {WKB, [LineString], [2.0, 1.0, 
6.0, 5.0], null, NULL}}|
+|file:/part-00007-1c13.parquet|{geometry -> {WKB, [LineString], [7.0, 2.0, 
9.0, 4.0], null, NULL}}|
+|file:/part-00011-1c13.parquet|{geometry -> {WKB, [LineString], [1.0, 1.0, 
3.0, 3.0], null, NULL}}|
+|file:/part-00000-1c13.parquet|{geometry -> {WKB, [], [0.0, 0.0, 0.0, 0.0], 
null, NULL}}          |
++-----------------------------+-------------------------------------------------------------------+
+```
+
+The `columns` column contains bounding box information on each file in the 
GeoParquet data lake.  These bbox values are useful for skipping files or 
row-groups.  More will come on bbox metadata in the following section.
+
+## Write GeoParquet with CRS Metadata
+
+Since v`1.5.1`, Sedona supports writing GeoParquet files with custom 
GeoParquet spec version and crs.
+The default GeoParquet spec version is `1.0.0` and the default crs is `null`. 
You can specify the GeoParquet spec version and crs as follows:
+
+```scala
+val projjson = "{...}" // PROJJSON string for all geometry columns
+df.write.format("geoparquet")
+    .option("geoparquet.version", "1.0.0")
+    .option("geoparquet.crs", projjson)
+    .save(geoparquetoutputlocation + "/GeoParquet_File_Name.parquet")
+```
+
+If you have multiple geometry columns written to the GeoParquet file, you can 
specify the CRS for each column.
+For example, `g0` and `g1` are two geometry columns in the DataFrame `df`, and 
you want to specify the CRS for each column as follows:
+
+```scala
+val projjson_g0 = "{...}" // PROJJSON string for g0
+val projjson_g1 = "{...}" // PROJJSON string for g1
+df.write.format("geoparquet")
+    .option("geoparquet.version", "1.0.0")
+    .option("geoparquet.crs.g0", projjson_g0)
+    .option("geoparquet.crs.g1", projjson_g1)
+    .save(geoparquetoutputlocation + "/GeoParquet_File_Name.parquet")
+```
+
+The value of `geoparquet.crs` and `geoparquet.crs.<column_name>` can be one of 
the following:
+
+* `"null"`: Explicitly setting `crs` field to `null`. This is the default 
behavior.
+* `""` (empty string): Omit the `crs` field. This implies that the CRS is 
[OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware 
implementations.
+* `"{...}"` (PROJJSON string): The `crs` field will be set as the PROJJSON 
object representing the Coordinate Reference System (CRS) of the geometry. You 
can find the PROJJSON string of a specific CRS from here: https://epsg.io/ 
(click the JSON option at the bottom of the page). You can also customize your 
PROJJSON string as needed.
+
+Please note that Sedona currently cannot set/get a projjson string to/from a 
CRS. Its geoparquet reader will ignore the projjson metadata and you will have 
to set your CRS via [`ST_SetSRID`](../api/sql/Function.md#st_setsrid) after 
reading the file.
+Its geoparquet writer will not leverage the SRID field of a geometry so you 
will have to always set the `geoparquet.crs` option manually when writing the 
file, if you want to write a meaningful CRS field.
+
+Due to the same reason, Sedona geoparquet reader and writer do NOT check the 
axis order (lon/lat or lat/lon) and assume they are handled by the users 
themselves when writing / reading the files. You can always use 
[`ST_FlipCoordinates`](../api/sql/Function.md#st_flipcoordinates) to swap the 
axis order of your geometries.
+
+## Save GeoParquet with Covering Metadata
+
+Since `v1.6.1`, Sedona supports writing the [`covering` 
field](https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md#covering)
 to geometry column metadata. The `covering` field specifies a bounding box 
column to help accelerate spatial data retrieval. The bounding box column 
should be a top-level struct column containing `xmin`, `ymin`, `xmax`, `ymax` 
columns. If the DataFrame you are writing contains such columns, you can 
specify `.option("geoparquet.coveri [...]
+
+```scala
+df.write.format("geoparquet")
+    .option("geoparquet.covering.geometry", "bbox")
+    .save("/path/to/saved_geoparquet.parquet")
+```
+
+If the DataFrame has only one geometry column, you can simply specify the 
`geoparquet.covering` option and omit the geometry column name:
+
+```scala
+df.write.format("geoparquet")
+    .option("geoparquet.covering", "bbox")
+    .save("/path/to/saved_geoparquet.parquet")
+```
+
+If the DataFrame does not have a covering column, you can construct one using 
Sedona's SQL functions:
+
+```scala
+val df_bbox = df.withColumn("bbox", expr("struct(ST_XMin(geometry) AS xmin, 
ST_YMin(geometry) AS ymin, ST_XMax(geometry) AS xmax, ST_YMax(geometry) AS 
ymax)"))
+df_bbox.write.format("geoparquet").option("geoparquet.covering.geometry", 
"bbox").save("/path/to/saved_geoparquet.parquet")
+```
+
+## Sort then Save GeoParquet
+
+To maximize the performance of Sedona GeoParquet filter pushdown, we suggest 
that you sort the data by their geohash values (see 
[ST_GeoHash](../api/sql/Function.md#st_geohash)) and then save as a GeoParquet 
file. An example is as follows:
+
+```
+SELECT col1, col2, geom, ST_GeoHash(geom, 5) as geohash
+FROM spatialDf
+ORDER BY geohash
+```
+
+Let's look closer at how Sedona uses the GeoParquet bbox metadata to optimize 
queries.
+
+## How Sedona uses GeoParquet bounding box (bbox) metadata with Spark
+
+The bounding box metadata specifies the area covered by geometric shapes in a 
given file.  Suppose you query points in a region not covered by the bounding 
box for a given file.  The engine can skip that entire file when executing the 
query because it’s known that it does not cover any relevant data.
+
+Skipping entire files of data can massively improve performance.  The more 
data that’s skipped, the faster the query will run.
+
+Let’s look at an example of a dataset with points and three bounding boxes.
+
+![GeoParquet bbox](../../image/tutorial/files/geoparquet_bbox1.png)
+
+Now, let’s apply a spatial filter to read points within a particular area:
+
+![GeoParquet bbox filter](../../image/tutorial/files/geoparquet_bbox1.png)
+
+Here is the query:
+
+```python
+my_shape = 'POLYGON((4.0 3.5, 4.0 6.0, 8.0 6.0, 8.0 4.5, 4.0 3.5))'
+
+res = sedona.sql(f'''
+select *
+from points
+where st_intersects(geometry, ST_GeomFromWKT('{my_shape}'))
+''')
+res.show(truncate=False)
+```
+
+Here are the results:
+
+```
++---+-----------+
+|id |geometry   |
++---+-----------+
+|e  |POINT (5 4)|
+|f  |POINT (6 4)|
+|g  |POINT (6 5)|
++---+-----------+
+```
+
+We don’t need the data in bounding boxes 1 or 3 to run this query.  We just 
need points in bounding box 2.  File skipping lets us read one file instead of 
three for this query.  Here are the bounding boxes for the files in the 
GeoParquet data lake:
+
+```
++--------------------------------------------------------------+
+|columns                                                       |
++--------------------------------------------------------------+
+|{geometry -> {WKB, [Point], [2.0, 1.0, 3.0, 3.0], null, NULL}}|
+|{geometry -> {WKB, [Point], [7.0, 1.0, 8.0, 2.0], null, NULL}}|
+|{geometry -> {WKB, [Point], [5.0, 4.0, 6.0, 5.0], null, NULL}}|
++--------------------------------------------------------------+
+```
+
+Skipping entire files can result in massive performance gains.
+
+## Advantages of GeoParquet files
+
+As previously mentioned, GeoParquet files have many advantages over 
row-oriented files like CSV or GeoJSON:
+
+* Column pruning
+* Row group filtering
+* Schema in the footer
+* Good compressibility because of the columnar data layout
+
+These are huge advantages but don’t offer the complete feature set that 
spatial data practitioners need.
+
+## Limitations of GeoParquet files
+
+Parquet data lakes and GeoParquet data lakes face many of the same limitations 
as other data lakes:
+
+* They don’t support reliable transactions
+* Some Data Manipulation Language (“DML”) operations (e.g., update, delete) 
are not supported
+* No concurrency protection
+* Poor performance compared to databases for certain operations
+
+To overcome these limitations, the data community has migrated to open table 
formats like Iceberg, Delta Lake, and Hudi.
+
+Iceberg recently rolled out support for geometry and geography columns.  This 
allows data practitioners to get the best features of open table formats and 
the benefits of Parquet files, where the data is stored.  Iceberg overcomes the 
limitations of GeoParquet data lakes and allows for reliable transactions and 
easy access to common operations like updates and deletes.
+
+## Conclusion
+
+GeoParquet is a significant innovation for the spatial data community.  It 
provides geo engineers with embedded schemas, performance optimizations, and 
native support for geometry columns.
+
+Spatial data engineers can also migrate to Iceberg now that it supports all 
these positive features of GeoParquet, and even more.  Iceberg provides many 
useful open table features and is almost always a better option than vanilla 
GeoParquet (except for single file datasets that will never change or for 
compatibility with other engines).
+
+Switching from legacy file formats like Shapefile, GeoPackage, CSV, GeoJSON to 
high-performance file formats like GeoParquet/Iceberg should immediately speed 
up your computational workflows.
diff --git a/docs/tutorial/sql.md b/docs/tutorial/sql.md
index d62026a4e8..828d1dd936 100644
--- a/docs/tutorial/sql.md
+++ b/docs/tutorial/sql.md
@@ -596,54 +596,7 @@ Please refer to [Reading Legacy Parquet 
Files](../api/sql/Reading-legacy-parquet
        GeoParquet file reader does not work on Databricks runtime when Photon 
is enabled. Please disable Photon when using
        GeoParquet file reader on Databricks runtime.
 
-### Inspect GeoParquet metadata
-
-Since v`1.5.1`, Sedona provides a Spark SQL data source 
`"geoparquet.metadata"` for inspecting GeoParquet metadata. The resulting 
dataframe contains
-the "geo" metadata for each input file.
-
-=== "Scala/Java"
-
-       ```scala
-       val df = 
sedona.read.format("geoparquet.metadata").load(geoparquetdatalocation1)
-       df.printSchema()
-       ```
-
-=== "Java"
-
-       ```java
-       Dataset<Row> df = 
sedona.read.format("geoparquet.metadata").load(geoparquetdatalocation1)
-       df.printSchema()
-       ```
-
-=== "Python"
-
-       ```python
-       df = 
sedona.read.format("geoparquet.metadata").load(geoparquetdatalocation1)
-       df.printSchema()
-       ```
-
-The output will be as follows:
-
-```
-root
- |-- path: string (nullable = true)
- |-- version: string (nullable = true)
- |-- primary_column: string (nullable = true)
- |-- columns: map (nullable = true)
- |    |-- key: string
- |    |-- value: struct (valueContainsNull = true)
- |    |    |-- encoding: string (nullable = true)
- |    |    |-- geometry_types: array (nullable = true)
- |    |    |    |-- element: string (containsNull = true)
- |    |    |-- bbox: array (nullable = true)
- |    |    |    |-- element: double (containsNull = true)
- |    |    |-- crs: string (nullable = true)
-```
-
-If the input Parquet file does not have GeoParquet metadata, the values of 
`version`, `primary_column` and `columns` fields of the resulting dataframe 
will be `null`.
-
-!!! note
-       `geoparquet.metadata` only supports reading GeoParquet specific 
metadata. Users can use 
[G-Research/spark-extension](https://github.com/G-Research/spark-extension/blob/13109b8e43dfba9272c85896ba5e30cfe280426f/PARQUET.md)
 to read comprehensive metadata of generic Parquet files.
+See [this page](../files/geoparquet-sedona-spark) for more information on 
loading GeoParquet.
 
 ## Load data from JDBC data sources
 
@@ -1474,77 +1427,7 @@ Since v`1.3.0`, Sedona natively supports writing 
GeoParquet file. GeoParquet can
 df.write.format("geoparquet").save(geoparquetoutputlocation + 
"/GeoParquet_File_Name.parquet")
 ```
 
-### CRS Metadata
-
-Since v`1.5.1`, Sedona supports writing GeoParquet files with custom 
GeoParquet spec version and crs.
-The default GeoParquet spec version is `1.0.0` and the default crs is `null`. 
You can specify the GeoParquet spec version and crs as follows:
-
-```scala
-val projjson = "{...}" // PROJJSON string for all geometry columns
-df.write.format("geoparquet")
-               .option("geoparquet.version", "1.0.0")
-               .option("geoparquet.crs", projjson)
-               .save(geoparquetoutputlocation + 
"/GeoParquet_File_Name.parquet")
-```
-
-If you have multiple geometry columns written to the GeoParquet file, you can 
specify the CRS for each column.
-For example, `g0` and `g1` are two geometry columns in the DataFrame `df`, and 
you want to specify the CRS for each column as follows:
-
-```scala
-val projjson_g0 = "{...}" // PROJJSON string for g0
-val projjson_g1 = "{...}" // PROJJSON string for g1
-df.write.format("geoparquet")
-               .option("geoparquet.version", "1.0.0")
-               .option("geoparquet.crs.g0", projjson_g0)
-               .option("geoparquet.crs.g1", projjson_g1)
-               .save(geoparquetoutputlocation + 
"/GeoParquet_File_Name.parquet")
-```
-
-The value of `geoparquet.crs` and `geoparquet.crs.<column_name>` can be one of 
the following:
-
-- `"null"`: Explicitly setting `crs` field to `null`. This is the default 
behavior.
-- `""` (empty string): Omit the `crs` field. This implies that the CRS is 
[OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware 
implementations.
-- `"{...}"` (PROJJSON string): The `crs` field will be set as the PROJJSON 
object representing the Coordinate Reference System (CRS) of the geometry. You 
can find the PROJJSON string of a specific CRS from here: https://epsg.io/ 
(click the JSON option at the bottom of the page). You can also customize your 
PROJJSON string as needed.
-
-Please note that Sedona currently cannot set/get a projjson string to/from a 
CRS. Its geoparquet reader will ignore the projjson metadata and you will have 
to set your CRS via [`ST_SetSRID`](../api/sql/Function.md#st_setsrid) after 
reading the file.
-Its geoparquet writer will not leverage the SRID field of a geometry so you 
will have to always set the `geoparquet.crs` option manually when writing the 
file, if you want to write a meaningful CRS field.
-
-Due to the same reason, Sedona geoparquet reader and writer do NOT check the 
axis order (lon/lat or lat/lon) and assume they are handled by the users 
themselves when writing / reading the files. You can always use 
[`ST_FlipCoordinates`](../api/sql/Function.md#st_flipcoordinates) to swap the 
axis order of your geometries.
-
-### Covering Metadata
-
-Since `v1.6.1`, Sedona supports writing the [`covering` 
field](https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md#covering)
 to geometry column metadata. The `covering` field specifies a bounding box 
column to help accelerate spatial data retrieval. The bounding box column 
should be a top-level struct column containing `xmin`, `ymin`, `xmax`, `ymax` 
columns. If the DataFrame you are writing contains such columns, you can 
specify `.option("geoparquet.coveri [...]
-
-```scala
-df.write.format("geoparquet")
-               .option("geoparquet.covering.geometry", "bbox")
-               .save("/path/to/saved_geoparquet.parquet")
-```
-
-If the DataFrame has only one geometry column, you can simply specify the 
`geoparquet.covering` option and omit the geometry column name:
-
-```scala
-df.write.format("geoparquet")
-               .option("geoparquet.covering", "bbox")
-               .save("/path/to/saved_geoparquet.parquet")
-```
-
-If the DataFrame does not have a covering column, you can construct one using 
Sedona's SQL functions:
-
-```scala
-val df_bbox = df.withColumn("bbox", expr("struct(ST_XMin(geometry) AS xmin, 
ST_YMin(geometry) AS ymin, ST_XMax(geometry) AS xmax, ST_YMax(geometry) AS 
ymax)"))
-df_bbox.write.format("geoparquet").option("geoparquet.covering.geometry", 
"bbox").save("/path/to/saved_geoparquet.parquet")
-```
-
-## Sort then Save GeoParquet
-
-To maximize the performance of Sedona GeoParquet filter pushdown, we suggest 
that you sort the data by their geohash values (see 
[ST_GeoHash](../api/sql/Function.md#st_geohash)) and then save as a GeoParquet 
file. An example is as follows:
-
-```
-SELECT col1, col2, geom, ST_GeoHash(geom, 5) as geohash
-FROM spatialDf
-ORDER BY geohash
-```
+See [this page](../files/geoparquet-sedona-spark) for more information on 
writing to GeoParquet.
 
 ## Save to Postgis
 
diff --git a/mkdocs.yml b/mkdocs.yml
index 035a312dfb..9c7a98005a 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -61,6 +61,7 @@ nav:
           - Sedona R: api/rdocs
           - Work with GeoPandas and Shapely: tutorial/geopandas-shapely.md
           - Files:
+              - GeoParquet: tutorial/files/geoparquet-sedona-spark.md
               - GeoJSON: tutorial/files/geojson-sedona-spark.md
           - Map visualization SQL app:
               - Scala/Java: tutorial/viz.md

(sedona) 06/09: [DOCS] add geoparquet docs page (#1818)

Reply via email to