petern48 opened a new issue, #2138:
URL: https://github.com/apache/sedona/issues/2138
A lot of text below, but I'll highlight the main difference first. Notice
our version has extra nested `[ ]`.
```
# Our dataframe_to_arrow returns the following column
geometry: [[0101...F03F],[0101...0040]]
# But geopandas returns this.
geometry: [[0101...F03F,0101...0040]]
```
This happens for the index column (`__index_level_0__`) too, which leads to
it being misterpreted as a column instead of being read in as an index when
calling `gpd.GeoDataFrame.from_arrow()`
```
# Sedona returns
__index_level_0__ geometry
0 1 POINT (1 1)
1 2 POINT (2 2)
# Geopandas returns this
geometry
1 POINT (1 1)
2 POINT (2 2)
```
Full script and output below.
```python
import geopandas as gpd
import sedona.geopandas as sgpd
from sedona.spark.geoarrow.geoarrow import dataframe_to_arrow
sgpd_df = sgpd.GeoDataFrame({"geometry": [Point(1, 1), Point(2, 2)]},
index=pd.Index([1, 2]))
spark_df = sgpd_df._internal.spark_frame.drop("__natural_order__") # don't
worry about this drop
sgpd_arrow = dataframe_to_arrow(spark_df)
gpd_df = gpd.GeoDataFrame({"geometry": [Point(1, 1), Point(2, 2)]},
index=pd.Index([1, 2]))
gpd_arrow = pa.table(gpd_df.to_arrow())
assert type(sgpd_arrow) == type(gpd_arrow) == pa.Table
print("SEDONA\n", sgpd_arrow, "\n")
gpd_df_from_sgpd_arrow = gpd.GeoDataFrame.from_arrow(sgpd_arrow)
print(gpd_df_from_sgpd_arrow, "\n")
print("GEOPANDAS\n", gpd_arrow, "\n")
gpd_df_from_gpd_arrow = gpd.GeoDataFrame.from_arrow(gpd_arrow)
print(gpd_df_from_gpd_arrow)
```
```
SEDONA
pyarrow.Table
__index_level_0__: int64
geometry: extension<geoarrow.wkb<WkbType>>
----
__index_level_0__: [[1],[2]]
geometry:
[[0101000000000000000000F03F000000000000F03F],[010100000000000000000000400000000000000040]]
__index_level_0__ geometry
0 1 POINT (1 1)
1 2 POINT (2 2)
GEOPANDAS
pyarrow.Table
geometry: extension<geoarrow.wkb<WkbType>>
__index_level_0__: int64
----
geometry:
[[0101000000000000000000F03F000000000000F03F,010100000000000000000000400000000000000040]]
__index_level_0__: [[1,2]]
geometry
1 POINT (1 1)
2 POINT (2 2)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]