jiayuasu commented on code in PR #9: URL: https://github.com/apache/sedona-spatialbench/pull/9#discussion_r2352845998
########## docs/index.md: ########## @@ -1,3 +1,68 @@ -# SpatialBench Documentation +# Sedona SpatialBench -Space for writing SpatialBench Documentation. +Sedona SpatialBench makes it easy to run spatial benchmarks on a realistic dataset with any query engine. + +The methodology is unbiased and the benchmarks in any environment to compare relative performance between runtimes. + +## Why SpatialBench + +SpatialBench includes representative spatial workflows, including the following types of queries: + +* Spatial filtering and aggregations +* KNN joins +* Range joins +* Distance joins + +Let’s dive into the advantages of SpatialBench. + +## Key advantages + +* Uses spatial datasets with geometry columns. +* Includes queries with different spatial predicates. +* Easily reproducible results. +* Includes a dataset generator to so results are reproducible. +* The scale factors of the datasets can be changed so that you can run the queries locally, in a data warehouse, or on a large cluster in the cloud. +* All the specifications used to run the benchmarks are documented, and the methodology is unbiased. +* The code is open source, allowing the community to provide feedback and keep the benchmarks up-to-date and reliable over time. + +## Generate synthetic data + +Here’s how you can install the synthetic data generator: + +``` +cargo install --path ./spatialbench-cli +``` + +Here’s how you can generate the synthetic dataset: + +``` +spatialbench-cli -s 1 --format=parquet +``` + +See the project repository [README](https://github.com/apache/sedona-spatialbench) for the complete set of straightforward data generation instructions. + +## Example query + +Here’s an example query that counts the number of trips that start within 500 meters of each building: + +```sql +SELECT + b.b_buildingkey, + b.b_name, + COUNT(*) AS nearby_pickup_count +FROM trip t +JOIN building b +ON ST_DWithin(t.t_pickup_loc, b.b_boundary, 500) +GROUP BY b.b_buildingkey, b.b_name +ORDER BY nearby_pickup_count DESC; +``` + +The SpatialBench dataset is based on the NYC Yellow Taxi Trips dataset. Review Comment: Can you try to put the content below somewhere in the page? SpatialBench is a geospatial benchmark for testing and optimizing spatial analytical query performance in data systems. Inspired by the SSB and NYC taxi data, it combines realistic urban mobility scenarios with a star schema extended with spatial attributes like pickup/dropoff points, zones, and building footprints. This design enables evaluation of geospatial operations such as spatial joins, distance queries, aggregations, and point-in-polygon analysis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
