Copilot commented on code in PR #2209:
URL: https://github.com/apache/sedona/pull/2209#discussion_r2249369029
##########
python/sedona/geopandas/geoseries.py:
##########
@@ -465,30 +465,21 @@ def crs(self) -> Union["CRS", None]:
if len(self) == 0:
return None
- if parse_version(pyspark.__version__) >= parse_version("3.5.0"):
- spark_col = stf.ST_SRID(F.first_value(self.spark.column,
ignoreNulls=True))
- # Set this to avoid error complaining that we don't have a groupby
column
- is_aggr = True
- else:
- spark_col = stf.ST_SRID(self.spark.column)
- is_aggr = False
+ # F.first is non-deterministic, but it doesn't matter because all
non-null values should be the same
+ spark_col = stf.ST_SRID(F.first(self.spark.column, ignorenulls=True))
Review Comment:
The parameter name should be `ignoreNulls` (camelCase) not `ignorenulls`
(lowercase). This is the correct parameter name for Spark SQL functions.
```suggestion
spark_col = stf.ST_SRID(F.first(self.spark.column, ignoreNulls=True))
```
##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -91,30 +89,21 @@ def _frame_join(
right_geom_col = None
# Find geometry columns in left dataframe
- for field in left_sdf.schema.fields:
- if field.dataType.typeName() in ("geometrytype", "binary"):
- left_geom_col = field.name
- break
+ left_geom_col = left_df.active_geometry_name
# Find geometry columns in right dataframe
- for field in right_sdf.schema.fields:
- if field.dataType.typeName() in ("geometrytype", "binary"):
- right_geom_col = field.name
- break
+ right_geom_col = right_df.active_geometry_name
+
+ if not left_geom_col:
+ raise ValueError("Left dataframe geometry column not set")
+ if not right_geom_col:
+ raise ValueError("Right dataframe geometry column not set")
if left_geom_col is None or right_geom_col is None:
raise ValueError("Both datasets must have geometry columns")
Review Comment:
This check is redundant since the previous checks on lines 97-100 already
validate the geometry columns. Consider removing the duplicate validation or
consolidate all geometry column validation into a single check.
```suggestion
```
##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -91,30 +89,21 @@ def _frame_join(
right_geom_col = None
# Find geometry columns in left dataframe
- for field in left_sdf.schema.fields:
- if field.dataType.typeName() in ("geometrytype", "binary"):
- left_geom_col = field.name
- break
+ left_geom_col = left_df.active_geometry_name
# Find geometry columns in right dataframe
- for field in right_sdf.schema.fields:
- if field.dataType.typeName() in ("geometrytype", "binary"):
- right_geom_col = field.name
- break
+ right_geom_col = right_df.active_geometry_name
+
+ if not left_geom_col:
Review Comment:
This check will never be True since `left_geom_col` is assigned from
`left_df.active_geometry_name` on line 92, and the subsequent check on line 102
already handles the case where `left_geom_col is None`. The check should be `if
left_geom_col is None:` to be consistent.
```suggestion
if left_geom_col is None:
```
##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -91,30 +89,21 @@ def _frame_join(
right_geom_col = None
# Find geometry columns in left dataframe
- for field in left_sdf.schema.fields:
- if field.dataType.typeName() in ("geometrytype", "binary"):
- left_geom_col = field.name
- break
+ left_geom_col = left_df.active_geometry_name
# Find geometry columns in right dataframe
- for field in right_sdf.schema.fields:
- if field.dataType.typeName() in ("geometrytype", "binary"):
- right_geom_col = field.name
- break
+ right_geom_col = right_df.active_geometry_name
+
+ if not left_geom_col:
+ raise ValueError("Left dataframe geometry column not set")
+ if not right_geom_col:
Review Comment:
This check will never be True since `right_geom_col` is assigned from
`right_df.active_geometry_name` on line 95, and the subsequent check on line
102 already handles the case where `right_geom_col is None`. The check should
be `if right_geom_col is None:` to be consistent.
```suggestion
if right_geom_col is None:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]