petern48 commented on issue #2138: URL: https://github.com/apache/sedona/issues/2138#issuecomment-3112257539
Yeah I tried the concatenating them, and that wasn't the issue. I dug into it for a bit, and it seems nontrivial. There's a `preserve_index` attribute you can set in arrow for storing the index column as metadata properly. Arrow's implementation is only made for pandas, and pyspark pandas hasn't implemented their own version, so we'd have to implement that logic from scratch. It's actually a surprising amount of code for this logic ([1000+ line cpp file](https://github.com/apache/arrow/blob/main/python/pyarrow/pandas_compat.py) for both reading and writing), so IMO, it's not worth trying add support for this in Sedona's `dataframe_to_arrow`. We already have a less-direct implementation of `to_arrow` and Sedona Geopandas is primarily unique for large workloads anyways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
