Re: [I] `dataframe_to_arrow` Returns a table that doesn't convert geopandas index correctly [sedona]

via GitHub Wed, 23 Jul 2025 23:50:09 -0700


petern48 commented on issue #2138:
URL: https://github.com/apache/sedona/issues/2138#issuecomment-3112257539


   Yeah I tried the concatenating them, and that wasn't the issue. I dug into 
it for a bit, and it seems nontrivial. There's a `preserve_index` attribute you 
can set in arrow for storing the index column as metadata properly. Arrow's 
implementation is only made for pandas, and pyspark pandas hasn't implemented 
their own version, so we'd have to implement that logic from scratch. It's 
actually a surprising amount of code for this logic ([1000+ line cpp 
file](https://github.com/apache/arrow/blob/main/python/pyarrow/pandas_compat.py)
 for both reading and writing), so IMO, it's not worth trying add support for 
this in Sedona's `dataframe_to_arrow`. We already have a less-direct 
implementation of `to_arrow` and Sedona Geopandas is primarily unique for large 
workloads anyways.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] `dataframe_to_arrow` Returns a table that doesn't convert geopandas index correctly [sedona]

Reply via email to