Robinlovelace opened a new issue, #477:
URL: https://github.com/apache/sedona-db/issues/477
I am encountering a `PhysicalOptimizer rule 'join_selection' failed. Schema
mismatch` error when performing an intersection query (ST_Intersects) in Python
using `sedonadb` 0.2.0.
This happens when intersecting points and polygons that have been
transformed to a projected CRS (EPSG:27700).
### Reproduction Script
Here is a minimal script that reproduces the issue using synthetic data
(100k points, 100 polygons):
```python
import sedona.db
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Point
print("Generating synthetic data (100k points, 100 polys)...")
# 1. Generate Data
n_points = 100000
n_polys = 100
# Points
lons = np.random.uniform(-6, 2, n_points)
lats = np.random.uniform(50, 59, n_points)
pts_df = pd.DataFrame({'geometry': [Point(x, y) for x, y in zip(lons,
lats)]})
pts_gdf = gpd.GeoDataFrame(pts_df, crs="EPSG:4326")
# Polygons (Centers buffered)
plons = np.random.uniform(-6, 2, n_polys)
plats = np.random.uniform(50, 59, n_polys)
poly_centers = gpd.GeoDataFrame(
{'geometry': [Point(x, y) for x, y in zip(plons, plats)]},
crs="EPSG:4326"
)
# Simple buffer in degrees
polys_gdf = poly_centers.buffer(0.1).to_frame(name='geometry')
# Connect
sd = sedona.db.connect()
# 2. Load
sd.create_data_frame(pts_gdf).to_view("points", overwrite=True)
sd.create_data_frame(polys_gdf).to_view("polygons", overwrite=True)
# 3. Transform to EPSG:27700
sd.sql("SELECT ST_Transform(geometry, 'EPSG:27700') as geometry FROM
points").to_view("points_proj", overwrite=True)
sd.sql("SELECT ST_Transform(geometry, 'EPSG:27700') as geometry FROM
polygons").to_view("polys_proj", overwrite=True)
# 4. Intersection
query = """
SELECT p.geometry
FROM points_proj AS p, polys_proj AS poly
WHERE ST_Intersects(p.geometry, poly.geometry)
"""
print("Running intersection query...")
# This throws the error
res = sd.sql(query).to_pandas()
```
### Error Output
```
sedonadb._lib.SedonaError: PhysicalOptimizer rule 'join_selection' failed.
Schema mismatch. Expected original schema: Schema { fields: [Field { name:
"geometry", data_type: Binary, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {"ARROW:extension:metadata": "{\"crs\":\"EPSG:27700\"}",
"ARROW:extension:name": "geoarrow.wkb"} }], ... }, got new schema: ...
This issue was likely caused by a bug in DataFusion's code. Please help us
to resolve this by filing a bug report in our issue tracker:
https://github.com/apache/datafusion/issues
```
### Environment
- `sedonadb`: 0.2.0
- Python: 3.10
- OS: Linux (Ubuntu)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]