Re: [PR] docs: add landing page [sedona-spatialbench]

via GitHub Tue, 16 Sep 2025 08:14:11 -0700


jiayuasu commented on code in PR #9:
URL: https://github.com/apache/sedona-spatialbench/pull/9#discussion_r2352845998



##########
docs/index.md:
##########
@@ -1,3 +1,68 @@
-# SpatialBench Documentation
+# Sedona SpatialBench
 
-Space for writing SpatialBench Documentation.
+Sedona SpatialBench makes it easy to run spatial benchmarks on a realistic 
dataset with any query engine.
+
+The methodology is unbiased and the benchmarks in any environment to compare 
relative performance between runtimes.
+
+## Why SpatialBench
+
+SpatialBench includes representative spatial workflows, including the 
following types of queries:
+
+* Spatial filtering and aggregations
+* KNN joins
+* Range joins
+* Distance joins
+
+Let’s dive into the advantages of SpatialBench.
+
+## Key advantages
+
+* Uses spatial datasets with geometry columns.
+* Includes queries with different spatial predicates.
+* Easily reproducible results.
+* Includes a dataset generator to so results are reproducible.
+* The scale factors of the datasets can be changed so that you can run the 
queries locally, in a data warehouse, or on a large cluster in the cloud.
+* All the specifications used to run the benchmarks are documented, and the 
methodology is unbiased.
+* The code is open source, allowing the community to provide feedback and keep 
the benchmarks up-to-date and reliable over time.
+
+## Generate synthetic data
+
+Here’s how you can install the synthetic data generator:
+
+```
+cargo install --path ./spatialbench-cli
+```
+
+Here’s how you can generate the synthetic dataset:
+
+```
+spatialbench-cli -s 1 --format=parquet
+```
+
+See the project repository 
[README](https://github.com/apache/sedona-spatialbench) for the complete set of 
straightforward data generation instructions.
+
+## Example query
+
+Here’s an example query that counts the number of trips that start within 500 
meters of each building:
+
+```sql
+SELECT 
+    b.b_buildingkey,
+    b.b_name,
+    COUNT(*) AS nearby_pickup_count
+FROM trip t
+JOIN building b
+ON ST_DWithin(t.t_pickup_loc, b.b_boundary, 500)
+GROUP BY b.b_buildingkey, b.b_name
+ORDER BY nearby_pickup_count DESC;
+```
+
+The SpatialBench dataset is based on the NYC Yellow Taxi Trips dataset.

Review Comment:
   Can you try to put the content below somewhere in the page?
   
   
   SpatialBench is a geospatial benchmark for testing and optimizing spatial 
analytical query performance in data systems. Inspired by the SSB and NYC taxi 
data, it combines realistic urban mobility scenarios with a star schema 
extended with spatial attributes like pickup/dropoff points, zones, and 
building footprints. This design enables evaluation of geospatial operations 
such as spatial joins, distance queries, aggregations, and point-in-polygon 
analysis.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: add landing page [sedona-spatialbench]

Reply via email to