Re: [PR] [DOCS] fixing read parquet files page [sedona-db]

via GitHub Thu, 18 Sep 2025 21:16:06 -0700


kadolor commented on code in PR #110:
URL: https://github.com/apache/sedona-db/pull/110#discussion_r2361696491



##########
docs/working-with-parquet-files.md:
##########
@@ -0,0 +1,80 @@
+# Working with Parquet Files
+
+To read a GeoPaquet or Parquet file, you must use the dedicated 
`sd.read_parquet()` method. You cannot query a file path directly within the 
`sd.sql()` `FROM` clause.
+
+The `sd.sql()` function is designed to query tables that have already been 
registered in the session.
+
+## Install SedonaDB
+
+Use pip to install SedonaDB from the Python Package Index (PyPI).
+
+
+```python
+%pip install "apache-sedona[db]"
+```

Review Comment:
   I'll turn this into a note but in case someone finds this page before 
looking at the install page I think it's important to call that out. 



##########
docs/working-with-parquet-files.md:
##########
@@ -0,0 +1,80 @@
+# Working with Parquet Files
+
+To read a GeoPaquet or Parquet file, you must use the dedicated 
`sd.read_parquet()` method. You cannot query a file path directly within the 
`sd.sql()` `FROM` clause.
+
+The `sd.sql()` function is designed to query tables that have already been 
registered in the session.
+
+## Install SedonaDB
+
+Use pip to install SedonaDB from the Python Package Index (PyPI).
+
+
+```python
+%pip install "apache-sedona[db]"
+```
+
+## Implementation
+
+To read a geoparquet or parquet file with SedonaDB, you must:
+
+1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.
+2. **Register** the data frame as a view with `to_view()`.
+3. **Query** the view using `sd.sql()`.
+4. **Write** to a Parquet file with `sd.to_parquet()`.
+
+
+```python
+# Import the sedona.db module and connect to SedonaDB
+import sedona.db
+sd = sedona.db.connect()
+```
+
+
+```python
+
+# 1. Load the Parquet file
+df = sd.read_parquet(
+    'https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/'
+    'natural-earth/files/natural-earth_cities_geo.parquet'
+)
+
+# 2. Register the data frame as a view
+df.to_view("zone")
+
+# 3. Query the view and store the result in a new DataFrame
+query_result_df = sd.sql("SELECT * FROM zone LIMIT 10")
+query_result_df.show()
+```
+
+
+```python
+
+# 4. Write the result to a new Parquet file
+output_path = "query_results.parquet"
+query_result_df.to_parquet(output_path)
+
+# (Optional) Verify the written file
+print(f"\nVerifying the written file at '{output_path}'...")
+verified_df = sd.read_parquet(output_path)
+verified_df.show(5)

Review Comment:
   @paleolimbot I think Jia wanted to emphasize that writing to parquet files 
is also possible.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DOCS] fixing read parquet files page [sedona-db]

Reply via email to