Robinlovelace opened a new issue, #477:
URL: https://github.com/apache/sedona-db/issues/477

   I am encountering a `PhysicalOptimizer rule 'join_selection' failed. Schema 
mismatch` error when performing an intersection query (ST_Intersects) in Python 
using `sedonadb` 0.2.0.
   
   This happens when intersecting points and polygons that have been 
transformed to a projected CRS (EPSG:27700).
   
   ### Reproduction Script
   
   Here is a minimal script that reproduces the issue using synthetic data 
(100k points, 100 polygons):
   
   ```python
   import sedona.db
   import geopandas as gpd
   import pandas as pd
   import numpy as np
   from shapely.geometry import Point
   
   print("Generating synthetic data (100k points, 100 polys)...")
   
   # 1. Generate Data
   n_points = 100000
   n_polys = 100
   
   # Points
   lons = np.random.uniform(-6, 2, n_points)
   lats = np.random.uniform(50, 59, n_points)
   pts_df = pd.DataFrame({'geometry': [Point(x, y) for x, y in zip(lons, 
lats)]})
   pts_gdf = gpd.GeoDataFrame(pts_df, crs="EPSG:4326")
   
   # Polygons (Centers buffered)
   plons = np.random.uniform(-6, 2, n_polys)
   plats = np.random.uniform(50, 59, n_polys)
   poly_centers = gpd.GeoDataFrame(
       {'geometry': [Point(x, y) for x, y in zip(plons, plats)]}, 
       crs="EPSG:4326"
   )
   # Simple buffer in degrees
   polys_gdf = poly_centers.buffer(0.1).to_frame(name='geometry')
   
   # Connect
   sd = sedona.db.connect()
   
   # 2. Load
   sd.create_data_frame(pts_gdf).to_view("points", overwrite=True)
   sd.create_data_frame(polys_gdf).to_view("polygons", overwrite=True)
   
   # 3. Transform to EPSG:27700
   sd.sql("SELECT ST_Transform(geometry, 'EPSG:27700') as geometry FROM 
points").to_view("points_proj", overwrite=True)
   sd.sql("SELECT ST_Transform(geometry, 'EPSG:27700') as geometry FROM 
polygons").to_view("polys_proj", overwrite=True)
   
   # 4. Intersection
   query = """
       SELECT p.geometry 
       FROM points_proj AS p, polys_proj AS poly 
       WHERE ST_Intersects(p.geometry, poly.geometry)
   """
   
   print("Running intersection query...")
   # This throws the error
   res = sd.sql(query).to_pandas()
   ```
   
   ### Error Output
   
   ```
   sedonadb._lib.SedonaError: PhysicalOptimizer rule 'join_selection' failed. 
Schema mismatch. Expected original schema: Schema { fields: [Field { name: 
"geometry", data_type: Binary, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {"ARROW:extension:metadata": "{\"crs\":\"EPSG:27700\"}", 
"ARROW:extension:name": "geoarrow.wkb"} }], ... }, got new schema: ...
   This issue was likely caused by a bug in DataFusion's code. Please help us 
to resolve this by filing a bug report in our issue tracker: 
https://github.com/apache/datafusion/issues
   ```
   
   ### Environment
   - `sedonadb`: 0.2.0
   - Python: 3.10
   - OS: Linux (Ubuntu)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to