Copilot commented on code in PR #26:
URL: 
https://github.com/apache/sedona-spatialbench/pull/26#discussion_r2370306361


##########
docs/queries.ipynb:
##########
@@ -0,0 +1,741 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cf23cfd2-c45c-4ca3-9dcb-70541f3cc47d",
+   "metadata": {},
+   "source": [
+    "# SpatialBench Queries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "9c1fe889-d035-4ed9-85a6-f7a8ecb891ad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sedona.db"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "96cf6ad5-e106-48f7-9097-9f3f740e9e7f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sd = sedona.db.connect()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "16a147f4-4a12-4050-a1bb-934834b4a6bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    
"sd.read_parquet(f\"/Users/matthewpowers/data/sf1-parquet/building.parquet\").to_view(\"building\")\n",
+    
"sd.read_parquet(f\"/Users/matthewpowers/data/sf1-parquet/customer.parquet\").to_view(\"customer\")\n",
+    
"sd.read_parquet(f\"/Users/matthewpowers/data/sf1-parquet/driver.parquet\").to_view(\"driver\")\n",
+    
"sd.read_parquet(f\"/Users/matthewpowers/data/sf1-parquet/trip.parquet\").to_view(\"trip\")\n",
+    
"sd.read_parquet(f\"/Users/matthewpowers/data/sf1-parquet/vehicle.parquet\").to_view(\"vehicle\")\n",
+    
"sd.read_parquet(f\"/Users/matthewpowers/data/sf1-parquet/zone.parquet\").to_view(\"zone\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e243bca-ca97-4f72-bc72-46ace7495019",
+   "metadata": {},
+   "source": [
+    "## Q1: Find trips starting within 50km of Sedona city center, ordered by 
distance\n",
+    "\n",
+    "**Real-life scenario:** Identify and rank trips by proximity to a city 
center for urban planning and transportation analysis.\n",
+    "\n",
+    "This query finds all taxi or rideshare trips that started within 50 
kilometers of downtown Sedona, Arizona. For each qualifying trip, it shows the 
trip ID, pickup coordinates, pickup time, and calculates the exact distance 
from the pickup location to Sedona's city center. The results are sorted to 
show the trips that picked up closest to downtown Sedona first, making it easy 
to see which rides originated nearest to the city center.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Distance-based spatial filtering (ST_DWithin)\n",
+    "2. Distance calculation to a fixed point\n",
+    "3. Coordinate extraction (ST_X, ST_Y)\n",
+    "4. Ordering by spatial distance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "f121f799-08f9-4679-a772-fb5f5470986f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌───────────┬────────────────┬──────────────┬──────────────────────────────┬──────────────────────┐\n",
+      "│ t_tripkey ┆   pickup_lon   ┆  pickup_lat  ┆         t_pickuptime      
   ┆  distance_to_center  │\n",
+      "│   int64   ┆     float64    ┆    float64   ┆ timestamp(millisecond, 
none) ┆        float64       │\n",
+      
"╞═══════════╪════════════════╪══════════════╪══════════════════════════════╪══════════════════════╡\n",
+      "│   1451371 ┆ -111.791052127 ┆ 34.826733457 ┆ 1998-08-12T06:47:01       
   ┆  0.05243333056935386 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│   2047835 ┆ -111.706967009 ┆ 34.883889472 ┆ 1992-04-08T07:36:09       
   ┆ 0.055865062714050374 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│   3936870 ┆ -111.827619221 ┆ 34.882950924 ┆ 1998-11-10T13:32:07       
   ┆  0.06792427838042854 │\n",
+      
"└───────────┴────────────────┴──────────────┴──────────────────────────────┴──────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT\n",
+    "    t.t_tripkey, ST_X(ST_GeomFromWKB(t.t_pickuploc)) AS pickup_lon, 
ST_Y(ST_GeomFromWKB(t.t_pickuploc)) AS pickup_lat, t.t_pickuptime,\n",
+    "    ST_Distance(ST_GeomFromWKB(t.t_pickuploc), ST_GeomFromText('POINT 
(-111.7610 34.8697)')) AS distance_to_center\n",
+    "FROM trip t\n",
+    "WHERE ST_DWithin(ST_GeomFromWKB(t.t_pickuploc), ST_GeomFromText('POINT 
(-111.7610 34.8697)'), 0.45) -- 50km radius around Sedona center\n",
+    "ORDER BY distance_to_center ASC, t.t_tripkey ASC\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1e2a458-fc72-4a21-889e-9099f9ad6fb7",
+   "metadata": {},
+   "source": [
+    "## Q2: Count trips starting within Coconino County (Arizona) zone\n",
+    "\n",
+    "**Real-life scenario:** Count all trips originating within a specific 
administrative boundary (county) for regional transportation statistics.\n",
+    "\n",
+    "This query counts how many taxi or rideshare trips started within 
Coconino County, Arizona. It does this by checking if each trip's pickup 
location falls inside the county's geographic boundaries. The result is a 
simple count showing the total number of trips that originated anywhere within 
Coconino County.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Point-in-polygon spatial filtering (ST_Intersects)\n",
+    "2. Subquery with spatial geometry selection\n",
+    "3. Simple aggregation on spatially filtered data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "cc219172-0cfb-4f60-a728-e4ec5cc5c21e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌───────────────────────────────┐\n",
+      "│ trip_count_in_coconino_county │\n",
+      "│             int64             │\n",
+      "╞═══════════════════════════════╡\n",
+      "│                           541 │\n",
+      "└───────────────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT COUNT(*) AS trip_count_in_coconino_county\n",
+    "FROM trip t\n",
+    "WHERE ST_Intersects(ST_GeomFromWKB(t.t_pickuploc), (SELECT 
ST_GeomFromWKB(z.z_boundary) FROM zone z WHERE z.z_name = 'Coconino County' 
LIMIT 1))\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "998daae7-31e4-4e93-823d-70d4cee00605",
+   "metadata": {},
+   "source": [
+    "## Q3: Monthly trip statistics within a 15km radius of the Sedona city 
center\n",
+    "\n",
+    "**Real-life scenario:** Track monthly travel trends and performance 
metrics in a metropolitan area with seasonal analysis.\n",
+    "\n",
+    "This query analyzes taxi and rideshare trip patterns around Sedona, 
Arizona, by grouping trips into monthly summaries. It looks at all trips that 
started within a 15-kilometer area around Sedona (a 10km box plus 5km buffer) 
and calculates key statistics for each month, including total number of trips, 
average trip distance, average trip duration, and average fare. The results are 
organized chronologically by month, allowing you to see seasonal trends and 
changes in ride patterns over time in the Sedona area.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Distance-based spatial filtering (ST_DWithin) with buffer\n",
+    "2. Temporal grouping (monthly aggregation)\n",
+    "3. Multiple statistical aggregations on spatially filtered data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "eb5d6347-47dd-48af-adb5-929838954ac4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌──────────────────────┬─────────────┬───────────────────┬─────────────────────┬───────────────────┐\n",
+      "│     pickup_month     ┆ total_trips ┆    avg_distance   ┆     
avg_duration    ┆      avg_fare     │\n",
+      "│ timestamp(milliseco… ┆    int64    ┆ decimal128(19, 9) ┆ 
duration(milliseco… ┆ decimal128(19, 9) │\n",
+      
"╞══════════════════════╪═════════════╪═══════════════════╪═════════════════════╪═══════════════════╡\n",
+      "│ 1992-04-01T00:00:00  ┆           2 ┆       0.000020000 ┆ 0 days 1 
hours 23 … ┆       0.000075000 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ 1992-07-01T00:00:00  ┆           1 ┆       0.000010000 ┆ 0 days 0 
hours 58 … ┆       0.000040000 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│ 1994-02-01T00:00:00  ┆           2 ┆       0.000020000 ┆ 0 days 1 
hours 23 … ┆       0.000050000 │\n",
+      
"└──────────────────────┴─────────────┴───────────────────┴─────────────────────┴───────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT\n",
+    "    DATE_TRUNC('month', t.t_pickuptime) AS pickup_month, 
COUNT(t.t_tripkey) AS total_trips,\n",
+    "    AVG(t.t_distance) AS avg_distance, AVG(t.t_dropofftime - 
t.t_pickuptime) AS avg_duration,\n",
+    "    AVG(t.t_fare) AS avg_fare\n",
+    "FROM trip t\n",
+    "WHERE ST_DWithin(\n",
+    "    ST_GeomFromWKB(t.t_pickuploc),\n",
+    "    ST_GeomFromText('POLYGON((-111.9060 34.7347, -111.6160 34.7347, 
-111.6160 35.0047, -111.9060 35.0047, -111.9060 34.7347))'), -- 10km bounding 
box around Sedona\n",
+    "    0.045 -- Additional 5km buffer\n",
+    ")\n",
+    "GROUP BY pickup_month\n",
+    "ORDER BY pickup_month\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b516ed52-8cf2-4c17-bc33-00002e22b287",
+   "metadata": {},
+   "source": [
+    "## Q4: Zone distribution of top 1000 trips by tip amount\n",
+    "\n",
+    "**Real-life scenario:** Analyze the geographic distribution of high-value 
trips (by tip amount) to understand premium service areas.\n",
+    "\n",
+    "This query identifies which neighborhoods or zones produced the most 
generous tippers by analyzing the top 1000 highest-tipping trips. It first 
finds the 1000 trips with the largest tips, then determines which geographic 
zones or neighborhoods those pickup locations fall within, and counts how many 
of these high-tip trips originated from each area. The results show a ranking 
of zones by the number of big tippers they produced, helping identify the most 
lucrative pickup areas for drivers seeking high-tip rides.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Subquery with ordering and limiting\n",
+    "2. Point-in-polygon spatial join (ST_Within)\n",
+    "3. Aggregation on spatially joined results\n",
+    "4. Multi-step query with spatial filtering and grouping"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "f8dcf328-a7f2-49be-9383-afa11766f871",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌───────────┬─────────────────────────────────┬────────────┐\n",
+      "│ z_zonekey ┆              z_name             ┆ trip_count │\n",
+      "│   int64   ┆             utf8view            ┆    int64   │\n",
+      "╞═══════════╪═════════════════════════════════╪════════════╡\n",
+      "│     65008 ┆ Ndélé                           ┆         35 │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│    147530 ┆ 乐山市                          ┆         27 │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│    150276 ┆ 锡林郭勒盟 ᠰᠢᠯᠢ ᠶᠢᠨ ᠭᠣᠣᠯ ᠠᠶᠢᠮᠠᠭ ┆         19 │\n",
+      "└───────────┴─────────────────────────────────┴────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT z.z_zonekey, z.z_name, COUNT(*) AS trip_count\n",
+    "FROM\n",
+    "    zone z\n",
+    "    JOIN (\n",
+    "        SELECT t.t_pickuploc\n",
+    "        FROM trip t\n",
+    "        ORDER BY t.t_tip DESC, t.t_tripkey ASC\n",
+    "        LIMIT 1000 -- Replace 1000 with x (how many top tips you want)\n",
+    "    ) top_trips ON ST_Within(ST_GeomFromWKB(top_trips.t_pickuploc), 
ST_GeomFromWKB(z.z_boundary))\n",
+    "GROUP BY z.z_zonekey, z.z_name\n",
+    "ORDER BY trip_count DESC, z.z_zonekey ASC\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad294223-742c-4b6b-a9ec-036723e67a1f",
+   "metadata": {},
+   "source": [
+    "## Q5: Monthly travel patterns for repeat customers (convex hull of 
dropoff locations)\n",
+    "\n",
+    "**Real-life scenario:** Analyze the geographic spread of travel patterns 
for frequent customers to understand their mobility behavior.  \n",
+    "\n",
+    "This query analyzes the monthly travel patterns of frequent customers by 
measuring how much geographic area they cover with their trips. For each 
customer who took more than five trips in a month, it calculates the size of 
the \"travel hull\" - the area enclosed by connecting all their dropoff 
locations that month. The results reveal which customers have the most 
expansive travel patterns, helping to identify power users who cover large 
geographic areas versus those who stick to smaller, local areas.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Spatial aggregation (ST_Collect/ARRAY_AGG)\n",
+    "2. Convex hull computation (ST_ConvexHull)\n",
+    "3. Area calculation on complex geometries\n",
+    "4. Temporal and customer-based grouping with spatial operations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "8c446f9a-3734-4a8d-87a8-0d9d6d34fd5c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌───────────┬────────────────────┬────────────────────────────┬────────────────────┬───────────────┐\n",
+      "│ c_custkey ┆    customer_name   ┆        pickup_month        ┆ 
monthly_travel_hul ┆ dropoff_count │\n",
+      "│   int64   ┆      utf8view      ┆ timestamp(millisecond, no… ┆       
l_area…      ┆     int64     │\n",
+      
"╞═══════════╪════════════════════╪════════════════════════════╪════════════════════╪═══════════════╡\n",
+      "│     12049 ┆ Customer#000012049 ┆ 1997-10-01T00:00:00        ┆  
22290.72518508209 ┆            18 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│     12091 ┆ Customer#000012091 ┆ 1997-08-01T00:00:00        ┆ 
18901.827089935185 ┆            18 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│     13231 ┆ Customer#000013231 ┆ 1994-09-01T00:00:00        ┆ 
24930.673726244015 ┆            18 │\n",
+      
"└───────────┴────────────────────┴────────────────────────────┴────────────────────┴───────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT\n",
+    "    c.c_custkey, c.c_name AS customer_name,\n",
+    "    DATE_TRUNC('month', t.t_pickuptime) AS pickup_month,\n",
+    "    ST_Area(ST_ConvexHull(ST_Collect(ST_GeomFromWKB(t.t_dropoffloc)))) AS 
monthly_travel_hull_area,\n",
+    "    COUNT(*) as dropoff_count\n",
+    "FROM trip t JOIN customer c ON t.t_custkey = c.c_custkey\n",
+    "GROUP BY c.c_custkey, c.c_name, pickup_month\n",
+    "HAVING dropoff_count > 5 -- Only include repeat customers for meaningful 
hulls\n",
+    "ORDER BY dropoff_count DESC, c.c_custkey ASC\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8501daf5-5e4a-46d5-86c0-c3f6ea8673ee",
+   "metadata": {},
+   "source": [
+    "## Q6: Zone statistics for trips within a 50km radius of the Sedona city 
center\n",
+    "\n",
+    "**Real-life scenario:** Analyze trip patterns in zones within a 
metropolitan area around a specific city center.\n",
+    "\n",
+    "This query analyzes ride activity across all neighborhoods and zones 
within a 50-kilometer area around Sedona, Arizona. It identifies which zones 
had the most pickup activity by counting total trips that started in each 
region. Also, it calculates the average trip cost and duration for rides 
originating from each zone. The results are ranked by pickup volume, showing 
which neighborhoods or areas generate the most ride demand and their typical 
trip characteristics within the greater Sedona region.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Polygon containment check (ST_Contains) with bounding box\n",
+    "2. Point-in-polygon spatial join (ST_Within)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "f168c302-5187-45f9-8d87-427bbd68c942",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌───────────┬─────────────────┬───────────────┬───────────────────┬────────────────────────────────┐\n",
+      "│ z_zonekey ┆      z_name     ┆ total_pickups ┆    avg_distance   ┆     
     avg_duration          │\n",
+      "│   int64   ┆     utf8view    ┆     int64     ┆ decimal128(19, 9) ┆     
 duration(millisecond)     │\n",
+      
"╞═══════════╪═════════════════╪═══════════════╪═══════════════════╪════════════════════════════════╡\n",
+      "│     30084 ┆ Coconino County ┆           541 ┆       0.000085323 ┆ 0 
days 1 hours 45 mins 16.591… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│     30083 ┆ Yavapai County  ┆           292 ┆       0.000076643 ┆ 0 
days 1 hours 36 mins 43.647… │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│     29488 ┆ Gila County     ┆            39 ┆       0.000065641 ┆ 0 
days 1 hours 16 mins 59.769… │\n",
+      
"└───────────┴─────────────────┴───────────────┴───────────────────┴────────────────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT\n",
+    "    z.z_zonekey, z.z_name,\n",
+    "    COUNT(t.t_tripkey) AS total_pickups, AVG(t.t_totalamount) AS 
avg_distance,\n",
+    "    AVG(t.t_dropofftime - t.t_pickuptime) AS avg_duration\n",
+    "FROM trip t, zone z\n",
+    "WHERE ST_Intersects(ST_GeomFromText('POLYGON((-112.2110 34.4197, 
-111.3110 34.4197, -111.3110 35.3197, -112.2110 35.3197, -112.2110 34.4197))'), 
ST_GeomFromWKB(z.z_boundary)) -- 50km bounding box around Sedona\n",
+    "  AND ST_Within(ST_GeomFromWKB(t.t_pickuploc), 
ST_GeomFromWKB(z.z_boundary))\n",
+    "GROUP BY z.z_zonekey, z.z_name\n",
+    "ORDER BY total_pickups DESC, z.z_zonekey ASC\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe58f65b-1403-4f91-81a3-e81784bfd3b4",
+   "metadata": {},
+   "source": [
+    "## Q7: Detect potential route detours by comparing reported vs. geometric 
distances\n",
+    "\n",
+    "**Real-life scenario:** Identify suspicious trips where the reported 
distance significantly exceeds the straight-line distance, potentially 
indicating fare manipulation.\n",
+    "\n",
+    "This query analyzes how much taxi and rideshare trips deviate from the 
most direct route by comparing the actual reported trip distance to the 
straight-line distance between pickup and dropoff points. It calculates a 
\"detour ratio\" that shows how much longer the actual route was compared to 
flying in a straight line.  For example, a ratio of 1.5 means the trip was 50% 
longer than the direct path. The results are sorted to show the trips with the 
highest detour ratios first, helping identify routes that took significant 
detours due to traffic, road layouts, or other factors.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Line geometry creation (ST_MakeLine)\n",
+    "2. Length calculation (ST_Length)\n",
+    "3. Coordinate system conversion and distance calculations\n",
+    "4. Ratio-based filtering on geometric vs. reported measurements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "3a8cddde-c04b-49d4-ab69-3d75d9a535ff",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      
"┌───────────┬─────────────────────┬────────────────────┬──────────────────────┐\n",
+      "│ t_tripkey ┆ reported_distance_m ┆   line_distance_m  ┆     
detour_ratio     │\n",
+      "│   int64   ┆  decimal128(15, 5)  ┆       float64      ┆        float64 
      │\n",
+      
"╞═══════════╪═════════════════════╪════════════════════╪══════════════════════╡\n",
+      "│   4688563 ┆             0.00010 ┆ 11111.126052681648 ┆  
8.99998789734414e-9 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│   2380123 ┆             0.00010 ┆ 11111.126095065882 ┆ 
8.999987863013003e-9 │\n",
+      
"├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│   3077131 ┆             0.00010 ┆ 11111.126138581423 ┆  
8.99998782776551e-9 │\n",
+      
"└───────────┴─────────────────────┴────────────────────┴──────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "WITH trip_lengths AS (\n",
+    "    SELECT\n",
+    "        t.t_tripkey,\n",
+    "        t.t_distance AS reported_distance_m,\n",
+    "        ST_Length(\n",
+    "            ST_MakeLine(\n",
+    "                ST_GeomFromWKB(t.t_pickuploc),\n",
+    "                ST_GeomFromWKB(t.t_dropoffloc)\n",
+    "            )\n",
+    "        ) / 0.000009 AS line_distance_m -- 1 meter = 0.000009 degree\n",
+    "    FROM trip t\n",
+    ")\n",
+    "SELECT\n",
+    "    t.t_tripkey,\n",
+    "    t.reported_distance_m,\n",
+    "    t.line_distance_m,\n",
+    "    t.reported_distance_m / NULLIF(t.line_distance_m, 0) AS 
detour_ratio\n",
+    "FROM trip_lengths t\n",
+    "ORDER BY detour_ratio DESC NULLS LAST, reported_distance_m DESC, 
t_tripkey ASC\n",
+    "\"\"\").show(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3040b853-100b-4c9d-8447-5b9bf4c4a895",
+   "metadata": {},
+   "source": [
+    "## Q8: Count nearby pickups for each building within a 500m radius\n",
+    "\n",
+    "**Real-life scenario:** Count how many trips start within 500 meters of 
each building.\n",
+    "\n",
+    "This query identifies which buildings generate the most taxi and 
rideshare pickup activity by counting trips that started within 500 meters of 
each building. It analyzes the relationship between specific buildings (like 
hotels, shopping centers, airports, or office buildings) and ride demand in 
their immediate vicinity. The results are ranked to show which buildings are 
the biggest trip generators, helping identify key pickup hotspots and 
understand how different types of buildings drive transportation demand.\n",
+    "\n",
+    "**Spatial query characteristics tested:**\n",
+    "\n",
+    "1. Distance spatial join between points and polygons\n",
+    "2. Aggregation on spatial join result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "58ac6e6a-2e3d-4dd7-bb03-1b16f55fc186",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "┌───────────────┬──────────┬─────────────────────┐\n",
+      "│ b_buildingkey ┆  b_name  ┆ nearby_pickup_count │\n",
+      "│     int64     ┆ utf8view ┆        int64        │\n",
+      "╞═══════════════╪══════════╪═════════════════════╡\n",
+      "│          3779 ┆ linen    ┆                  42 │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│         19135 ┆ misty    ┆                  36 │\n",
+      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
+      "│          4416 ┆ sienna   ┆                  26 │\n",
+      "└───────────────┴──────────┴─────────────────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "sd.sql(\"\"\"\n",
+    "SELECT b.b_buildingkey, b.b_name, COUNT(*) AS nearby_pickup_count\n",
+    "FROM trip t\n",
+    "JOIN building b\n",
+    "ON ST_DWithin(ST_GeomFromWKB(t.t_pickuploc), 
ST_GeomFromWKB(b.b_boundary), 0.0045) -- ~500m\n",

Review Comment:
   The distance threshold 0.0045 (representing ~500m) is a magic number that 
lacks clear documentation of the unit conversion. This makes the query 
difficult to understand and modify. Consider defining this as a named constant 
or adding clearer documentation about the coordinate system and units.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to