Copilot commented on code in PR #2209:
URL: https://github.com/apache/sedona/pull/2209#discussion_r2249369029


##########
python/sedona/geopandas/geoseries.py:
##########
@@ -465,30 +465,21 @@ def crs(self) -> Union["CRS", None]:
         if len(self) == 0:
             return None
 
-        if parse_version(pyspark.__version__) >= parse_version("3.5.0"):
-            spark_col = stf.ST_SRID(F.first_value(self.spark.column, 
ignoreNulls=True))
-            # Set this to avoid error complaining that we don't have a groupby 
column
-            is_aggr = True
-        else:
-            spark_col = stf.ST_SRID(self.spark.column)
-            is_aggr = False
+        # F.first is non-deterministic, but it doesn't matter because all 
non-null values should be the same
+        spark_col = stf.ST_SRID(F.first(self.spark.column, ignorenulls=True))

Review Comment:
   The parameter name should be `ignoreNulls` (camelCase) not `ignorenulls` 
(lowercase). This is the correct parameter name for Spark SQL functions.
   ```suggestion
           spark_col = stf.ST_SRID(F.first(self.spark.column, ignoreNulls=True))
   ```



##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -91,30 +89,21 @@ def _frame_join(
     right_geom_col = None
 
     # Find geometry columns in left dataframe
-    for field in left_sdf.schema.fields:
-        if field.dataType.typeName() in ("geometrytype", "binary"):
-            left_geom_col = field.name
-            break
+    left_geom_col = left_df.active_geometry_name
 
     # Find geometry columns in right dataframe
-    for field in right_sdf.schema.fields:
-        if field.dataType.typeName() in ("geometrytype", "binary"):
-            right_geom_col = field.name
-            break
+    right_geom_col = right_df.active_geometry_name
+
+    if not left_geom_col:
+        raise ValueError("Left dataframe geometry column not set")
+    if not right_geom_col:
+        raise ValueError("Right dataframe geometry column not set")
 
     if left_geom_col is None or right_geom_col is None:
         raise ValueError("Both datasets must have geometry columns")

Review Comment:
   This check is redundant since the previous checks on lines 97-100 already 
validate the geometry columns. Consider removing the duplicate validation or 
consolidate all geometry column validation into a single check.
   ```suggestion
   
   ```



##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -91,30 +89,21 @@ def _frame_join(
     right_geom_col = None
 
     # Find geometry columns in left dataframe
-    for field in left_sdf.schema.fields:
-        if field.dataType.typeName() in ("geometrytype", "binary"):
-            left_geom_col = field.name
-            break
+    left_geom_col = left_df.active_geometry_name
 
     # Find geometry columns in right dataframe
-    for field in right_sdf.schema.fields:
-        if field.dataType.typeName() in ("geometrytype", "binary"):
-            right_geom_col = field.name
-            break
+    right_geom_col = right_df.active_geometry_name
+
+    if not left_geom_col:

Review Comment:
   This check will never be True since `left_geom_col` is assigned from 
`left_df.active_geometry_name` on line 92, and the subsequent check on line 102 
already handles the case where `left_geom_col is None`. The check should be `if 
left_geom_col is None:` to be consistent.
   ```suggestion
       if left_geom_col is None:
   ```



##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -91,30 +89,21 @@ def _frame_join(
     right_geom_col = None
 
     # Find geometry columns in left dataframe
-    for field in left_sdf.schema.fields:
-        if field.dataType.typeName() in ("geometrytype", "binary"):
-            left_geom_col = field.name
-            break
+    left_geom_col = left_df.active_geometry_name
 
     # Find geometry columns in right dataframe
-    for field in right_sdf.schema.fields:
-        if field.dataType.typeName() in ("geometrytype", "binary"):
-            right_geom_col = field.name
-            break
+    right_geom_col = right_df.active_geometry_name
+
+    if not left_geom_col:
+        raise ValueError("Left dataframe geometry column not set")
+    if not right_geom_col:

Review Comment:
   This check will never be True since `right_geom_col` is assigned from 
`right_df.active_geometry_name` on line 95, and the subsequent check on line 
102 already handles the case where `right_geom_col is None`. The check should 
be `if right_geom_col is None:` to be consistent.
   ```suggestion
       if right_geom_col is None:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to