Copilot commented on code in PR #30:
URL: 
https://github.com/apache/sedona-spatialbench/pull/30#discussion_r2371211342


##########
docs/queries.ipynb:
##########
@@ -332,36 +414,48 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 42,
    "id": "f168c302-5187-45f9-8d87-427bbd68c942",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      
"┌───────────┬─────────────────┬───────────────┬───────────────────┬────────────────────────────────┐\n",
-      "│ z_zonekey ┆      z_name     ┆ total_pickups ┆    avg_distance   ┆     
     avg_duration          │\n",
-      "│   int64   ┆     utf8view    ┆     int64     ┆ decimal128(19, 9) ┆     
 duration(millisecond)     │\n",
-      
"╞═══════════╪═════════════════╪═══════════════╪═══════════════════╪════════════════════════════════╡\n",
-      "│     30084 ┆ Coconino County ┆           541 ┆       0.000085323 ┆ 0 
days 1 hours 45 mins 16.591… │\n",
-      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
-      "│     30083 ┆ Yavapai County  ┆           292 ┆       0.000076643 ┆ 0 
days 1 hours 36 mins 43.647… │\n",
-      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
-      "│     29488 ┆ Gila County     ┆            39 ┆       0.000065641 ┆ 0 
days 1 hours 16 mins 59.769… │\n",
-      
"└───────────┴─────────────────┴───────────────┴───────────────────┴────────────────────────────────┘\n"
+      
"┌───────────┬─────────────────┬───────────────┬──────────────┬────────────────────────────────────┐\n",
+      "│ z_zonekey ┆      z_name     ┆ total_pickups ┆ avg_distance ┆          
  avg_duration            │\n",
+      "│   int64   ┆       utf8      ┆     int64     ┆  decimal128  ┆          
    duration              │\n",
+      
"╞═══════════╪═════════════════╪═══════════════╪══════════════╪════════════════════════════════════╡\n",
+      "│     30084 ┆ Coconino County ┆           541 ┆  0.000030406 ┆ 0 days 1 
hours 45 mins 16.591 secs │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│     30083 ┆ Yavapai County  ┆           292 ┆  0.000027157 ┆ 0 days 1 
hours 36 mins 43.647 secs │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│     29488 ┆ Gila County     ┆            39 ┆  0.000021282 ┆ 0 days 1 
hours 16 mins 59.769 secs │\n",
+      
"└───────────┴─────────────────┴───────────────┴──────────────┴────────────────────────────────────┘\n"
      ]
     }
    ],
    "source": [
     "sd.sql(\"\"\"\n",
     "SELECT\n",
-    "    z.z_zonekey, z.z_name,\n",
-    "    COUNT(t.t_tripkey) AS total_pickups, AVG(t.t_totalamount) AS 
avg_distance,\n",
+    "    z.z_zonekey,\n",
+    "    z.z_name,\n",
+    "    COUNT(t.t_tripkey) AS total_pickups,\n",
+    "    AVG(t.t_distance) AS avg_distance, -- Corrected from t_totalamount\n",

Review Comment:
   [nitpick] The comment contains a typo: 't_totalamount' should be 
't_total_amount' if referring to a total amount field.
   ```suggestion
       "    AVG(t.t_distance) AS avg_distance, -- Corrected from 
t_total_amount\n",
   ```



##########
docs/queries.md:
##########
@@ -0,0 +1,612 @@
+# Run the SpatialBench Queries
+
+This notebook contains the queries that make up the SpatialBench benchmark.
+
+SpatialBench is a benchmark for assessing geospatial SQL analytics query 
performance across database systems. The queries represent common, real-world 
geospatial analytics tasks and are designed to test a wide range of spatial 
functions and join conditions.
+
+The benchmark uses a realistic but synthetic, transportation-themed dataset to 
ensure the queries reflect practical use cases. By running these queries, you 
can evaluate and compare the relative performance of different spatial query 
engines in a consistent and unbiased manner.
+
+## Before you start
+
+Before running this notebook, ensure that you have installed the packages in 
the `requirements.txt` file:
+
+
+```python
+%pip install -r ~/sedona-spatialbench/docs/requirements.txt
+```
+
+    ...
+    ...
+    Note: you may need to restart the kernel to use updated packages.
+
+
+Additionally, install the SpatialBench CLI and generate the synthetic data on 
your machine:
+
+```
+# SpatialBench CLI
+cargo install --path ./spatialbench-cli
+# Generate the benchmarking data to the sf1-parquet directory
+spatialbench-cli -s 1 --format=parquet --output-dir sf1-parquet
+```
+
+
+```python
+import sedona.db
+```
+
+
+```python
+sd = sedona.db.connect()
+```
+
+
+```python
+sd.read_parquet(f"../sf1-parquet/building.parquet").to_view("building")
+sd.read_parquet(f"../sf1-parquet/customer.parquet").to_view("customer")
+sd.read_parquet(f"../sf1-parquet/driver.parquet").to_view("driver")
+sd.read_parquet(f"../sf1-parquet/trip.parquet").to_view("trip")
+sd.read_parquet(f"../sf1-parquet/vehicle.parquet").to_view("vehicle")
+sd.read_parquet(f"../sf1-parquet/zone.parquet").to_view("zone")
+```
+
+## Q1: Find trips starting within 50km of Sedona city center, ordered by 
distance
+
+**Real-life scenario:** Identify and rank trips by proximity to a city center 
for urban planning and transportation analysis.
+
+This query finds all taxi or rideshare trips that started within 50 kilometers 
of downtown Sedona, Arizona. For each qualifying trip, it shows the trip ID, 
pickup coordinates, pickup time, and calculates the exact distance from the 
pickup location to Sedona's city center. The results are sorted to show the 
trips that picked up closest to downtown Sedona first, making it easy to see 
which rides originated nearest to the city center.
+
+**Spatial query characteristics tested:**
+
+1. Distance-based spatial filtering (ST_DWithin)
+2. Distance calculation to a fixed point
+3. Coordinate extraction (ST_X, ST_Y)
+4. Ordering by spatial distance
+
+
+```python
+sd.sql("""
+SELECT
+    t.t_tripkey,
+    ST_X(ST_GeomFromWKB(t.t_pickuploc)) AS pickup_lon,
+    ST_Y(ST_GeomFromWKB(t.t_pickuploc)) AS pickup_lat,
+    t.t_pickuptime,
+    ST_Distance(
+        ST_GeomFromWKB(t.t_pickuploc),
+        ST_GeomFromText('POINT (-111.7610 34.8697)')
+    ) AS distance_to_center
+FROM trip t
+WHERE ST_DWithin(
+    ST_GeomFromWKB(t.t_pickuploc),
+    ST_GeomFromText('POINT (-111.7610 34.8697)'),
+    0.45 -- 50km radius around Sedona center in degrees
+)
+ORDER BY distance_to_center ASC, t.t_tripkey ASC
+""").show(3)
+```
+
+    
┌───────────┬────────────────┬──────────────┬─────────────────────┬──────────────────────┐
+    │ t_tripkey ┆   pickup_lon   ┆  pickup_lat  ┆     t_pickuptime    ┆  
distance_to_center  │
+    │   int64   ┆     float64    ┆    float64   ┆      timestamp      ┆        
float64       │
+    
╞═══════════╪════════════════╪══════════════╪═════════════════════╪══════════════════════╡
+    │   1451371 ┆ -111.791052127 ┆ 34.826733457 ┆ 1998-08-12T06:47:01 ┆  
0.05243333056935387 │
+    
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │   2047835 ┆ -111.706967009 ┆ 34.883889472 ┆ 1992-04-08T07:36:09 ┆ 
0.055865062714050374 │
+    
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │   3936870 ┆ -111.827619221 ┆ 34.882950924 ┆ 1998-11-10T13:32:07 ┆  
0.06792427838042854 │
+    
└───────────┴────────────────┴──────────────┴─────────────────────┴──────────────────────┘
+
+
+## Q2: Count trips starting within Coconino County (Arizona) zone
+
+**Real-life scenario:** Count all trips originating within a specific 
administrative boundary (county) for regional transportation statistics.
+
+This query counts how many taxi or rideshare trips started within Coconino 
County, Arizona. It does this by checking if each trip's pickup location falls 
inside the county's geographic boundaries. The result is a simple count showing 
the total number of trips that originated anywhere within Coconino County.
+
+**Spatial query characteristics tested:**
+
+1. Point-in-polygon spatial filtering (ST_Intersects)
+2. Subquery with spatial geometry selection
+3. Simple aggregation on spatially filtered data
+
+
+```python
+sd.sql("""
+SELECT COUNT(*) AS trip_count_in_coconino_county
+FROM trip t
+WHERE ST_Intersects(
+    ST_GeomFromWKB(t.t_pickuploc),
+    (
+        SELECT ST_GeomFromWKB(z.z_boundary)
+        FROM zone z
+        WHERE z.z_name = 'Coconino County'
+        LIMIT 1
+    )
+)
+""").show(3)
+```
+
+    ┌───────────────────────────────┐
+    │ trip_count_in_coconino_county │
+    │             int64             │
+    ╞═══════════════════════════════╡
+    │                           541 │
+    └───────────────────────────────┘
+
+
+## Q3: Monthly trip statistics within a 15km radius of the Sedona city center
+
+**Real-life scenario:** Track monthly travel trends and performance metrics in 
a metropolitan area with seasonal analysis.
+
+This query analyzes taxi and rideshare trip patterns around Sedona, Arizona, 
by grouping trips into monthly summaries. It looks at all trips that started 
within a 15-kilometer area around Sedona (a 10km box plus 5km buffer) and 
calculates key statistics for each month, including total number of trips, 
average trip distance, average trip duration, and average fare. The results are 
organized chronologically by month, allowing you to see seasonal trends and 
changes in ride patterns over time in the Sedona area.
+
+**Spatial query characteristics tested:**
+
+1. Distance-based spatial filtering (ST_DWithin) with buffer
+2. Temporal grouping (monthly aggregation)
+3. Multiple statistical aggregations on spatially filtered data
+
+
+```python
+sd.sql("""
+SELECT
+    DATE_TRUNC('month', t.t_pickuptime) AS pickup_month,
+    COUNT(t.t_tripkey) AS total_trips,
+    AVG(t.t_distance) AS avg_distance,
+    AVG(t.t_dropofftime - t.t_pickuptime) AS avg_duration,
+    AVG(t.t_fare) AS avg_fare
+FROM trip t
+WHERE ST_DWithin(
+    ST_GeomFromWKB(t.t_pickuploc),
+    ST_GeomFromText('POLYGON((
+        -111.9060 34.7347, -111.6160 34.7347,
+        -111.6160 35.0047, -111.9060 35.0047,
+        -111.9060 34.7347
+    ))'), -- Bounding box around Sedona
+    0.045 -- Additional 5km buffer in degrees
+)
+GROUP BY pickup_month
+ORDER BY pickup_month
+""").show(3)
+```
+
+    
┌─────────────────────┬─────────────┬──────────────┬─────────────────────────────────┬─────────────┐
+    │     pickup_month    ┆ total_trips ┆ avg_distance ┆           
avg_duration          ┆   avg_fare  │
+    │      timestamp      ┆    int64    ┆  decimal128  ┆             duration  
          ┆  decimal128 │
+    
╞═════════════════════╪═════════════╪══════════════╪═════════════════════════════════╪═════════════╡
+    │ 1992-04-01T00:00:00 ┆           2 ┆  0.000020000 ┆ 0 days 1 hours 23 
mins 47.000 … ┆ 0.000075000 │
+    
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ 1992-07-01T00:00:00 ┆           1 ┆  0.000010000 ┆ 0 days 0 hours 58 
mins 58.000 … ┆ 0.000040000 │
+    
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │ 1994-02-01T00:00:00 ┆           2 ┆  0.000020000 ┆ 0 days 1 hours 23 
mins 50.000 … ┆ 0.000050000 │
+    
└─────────────────────┴─────────────┴──────────────┴─────────────────────────────────┴─────────────┘
+
+
+## Q4: Zone distribution of top 1000 trips by tip amount
+
+**Real-life scenario:** Analyze the geographic distribution of high-value 
trips (by tip amount) to understand premium service areas.
+
+This query identifies which neighborhoods or zones produced the most generous 
tippers by analyzing the top 1000 highest-tipping trips. It first finds the 
1000 trips with the largest tips, then determines which geographic zones or 
neighborhoods those pickup locations fall within, and counts how many of these 
high-tip trips originated from each area. The results show a ranking of zones 
by the number of big tippers they produced, helping identify the most lucrative 
pickup areas for drivers seeking high-tip rides.
+
+**Spatial query characteristics tested:**
+
+1. Subquery with ordering and limiting
+2. Point-in-polygon spatial join (ST_Within)
+3. Aggregation on spatially joined results
+4. Multi-step query with spatial filtering and grouping
+
+
+```python
+sd.sql("""
+SELECT
+    z.z_zonekey,
+    z.z_name,
+    COUNT(*) AS trip_count
+FROM
+    zone z
+    JOIN (
+        SELECT t.t_pickuploc
+        FROM trip t
+        ORDER BY t.t_tip DESC, t.t_tripkey ASC
+        LIMIT 1000
+    ) top_trips
+    ON ST_Within(
+        ST_GeomFromWKB(top_trips.t_pickuploc),
+        ST_GeomFromWKB(z.z_boundary)
+    )
+GROUP BY z.z_zonekey, z.z_name
+ORDER BY trip_count DESC, z.z_zonekey ASC
+""").show(3)
+```
+
+    ┌───────────┬─────────────────────────────────┬────────────┐
+    │ z_zonekey ┆              z_name             ┆ trip_count │
+    │   int64   ┆               utf8              ┆    int64   │
+    ╞═══════════╪═════════════════════════════════╪════════════╡
+    │     65008 ┆ Ndélé                           ┆         35 │
+    ├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │    147530 ┆ 乐山市                          ┆         27 │
+    ├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │    150276 ┆ 锡林郭勒盟 ᠰᠢᠯᠢ ᠶᠢᠨ ᠭᠣᠣᠯ ᠠᠶᠢᠮᠠᠭ ┆         19 │
+    └───────────┴─────────────────────────────────┴────────────┘
+
+
+## Q5: Monthly travel patterns for repeat customers (convex hull of dropoff 
locations)
+
+**Real-life scenario:** Analyze the geographic spread of travel patterns for 
frequent customers to understand their mobility behavior.
+
+This query analyzes the monthly travel patterns of frequent customers by 
measuring how much geographic area they cover with their trips. For each 
customer who took more than five trips in a month, it calculates the size of 
the "travel hull" - the area enclosed by connecting all their dropoff locations 
that month. The results reveal which customers have the most expansive travel 
patterns, helping to identify power users who cover large geographic areas 
versus those who stick to smaller, local areas.
+
+**Spatial query characteristics tested:**
+
+1. Spatial aggregation (ST_Collect/ARRAY_AGG)
+2. Convex hull computation (ST_ConvexHull)
+3. Area calculation on complex geometries
+4. Temporal and customer-based grouping with spatial operations
+
+
+```python
+sd.sql("""
+SELECT
+    c.c_custkey,
+    c.c_name AS customer_name,
+    DATE_TRUNC('month', t.t_pickuptime) AS pickup_month,
+    ST_Area(
+        ST_ConvexHull(ST_Collect(ST_GeomFromWKB(t.t_dropoffloc)))
+    ) AS monthly_travel_hull_area,
+    COUNT(*) as dropoff_count
+FROM trip t
+JOIN customer c
+    ON t.t_custkey = c.c_custkey
+GROUP BY c.c_custkey, c.c_name, pickup_month
+HAVING dropoff_count > 5 -- Only include repeat customers
+ORDER BY monthly_travel_hull_area DESC, c.c_custkey ASC
+""").show(3)
+```
+
+    
┌───────────┬────────────────────┬─────────────────────┬────────────────────┬───────────────┐
+    │ c_custkey ┆    customer_name   ┆     pickup_month    ┆ 
monthly_travel_hul ┆ dropoff_count │
+    │   int64   ┆        utf8        ┆      timestamp      ┆       l_area…     
 ┆     int64     │
+    
╞═══════════╪════════════════════╪═════════════════════╪════════════════════╪═══════════════╡
+    │     25975 ┆ Customer#000025975 ┆ 1992-02-01T00:00:00 ┆ 
34941.303419053635 ┆            10 │
+    
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │     12061 ┆ Customer#000012061 ┆ 1997-03-01T00:00:00 ┆  
34607.53871953154 ┆            14 │
+    
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
+    │     21418 ┆ Customer#000021418 ┆ 1993-08-01T00:00:00 ┆  
34465.32323910264 ┆             9 │
+    
└───────────┴────────────────────┴─────────────────────┴────────────────────┴───────────────┘
+
+
+## Q6: Zone statistics for trips within a 50km radius of the Sedona city center
+
+**Real-life scenario:** Analyze trip patterns in zones within a metropolitan 
area around a specific city center.
+
+This query analyzes ride activity across all neighborhoods and zones within a 
50-kilometer area around Sedona, Arizona. It identifies which zones had the 
most pickup activity by counting total trips that started in each region. Also, 
it calculates the average trip cost and duration for rides originating from 
each zone. The results are ranked by pickup volume, showing which neighborhoods 
or areas generate the most ride demand and their typical trip characteristics 
within the greater Sedona region.
+
+**Spatial query characteristics tested:**
+
+1. Polygon containment check (ST_Contains) with bounding box
+2. Point-in-polygon spatial join (ST_Within)
+
+
+```python
+sd.sql("""
+SELECT
+    z.z_zonekey,
+    z.z_name,
+    COUNT(t.t_tripkey) AS total_pickups,
+    AVG(t.t_distance) AS avg_distance, -- Corrected from t_totalamount

Review Comment:
   [nitpick] The comment contains a typo: 't_totalamount' should be 
't_total_amount' if referring to a total amount field.
   ```suggestion
       AVG(t.t_distance) AS avg_distance, -- Corrected from t_total_amount
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to