petern48 commented on code in PR #2209:
URL: https://github.com/apache/sedona/pull/2209#discussion_r2252053060
##########
python/sedona/geopandas/tools/sjoin.py:
##########
@@ -195,36 +203,35 @@ def _frame_join(
final_columns.append(f"{col_name} as {base_name}")
# Select final columns
- result_df = spatial_join_df.selectExpr(*final_columns)
-
- # Return appropriate type based on input
- if isinstance(left_df, GeoSeries) and isinstance(right_df, GeoSeries):
- # Return GeoSeries for GeoSeries inputs
- internal = InternalFrame(
- spark_frame=result_df,
- index_spark_columns=None,
- column_labels=[left_df._col_label],
- data_spark_columns=[scol_for(result_df, "geometry")],
- data_fields=[left_df._internal.data_fields[0]],
- column_label_names=left_df._internal.column_label_names,
- )
- return _to_geo_series(first_series(ps.DataFrame(internal)))
- else:
- # Return GeoDataFrame for GeoDataFrame inputs
- return GeoDataFrame(result_df)
+ result_df = spatial_join_df.selectExpr(*final_columns).orderBy(
+ SPARK_DEFAULT_INDEX_NAME
+ )
Review Comment:
Yeah, maintaining the same order after a join probably isn't very important
for a lot of cases. Looks like PySpark Pandas also doesn't guarantee sort order
"not preserve key order unlike pandas" [according to their
docs](https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.merge.html).
I removed the orderBy and updated the docs accordingly, mentioning they can
sort after using a method like `sort_index()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]