I’ve read lot of nice things about Apache Arrow in-memory columnar format. On their homepage they mention Cassandra as a possible storage which could interoperate with Arrow. Unfortunately I was not able to find any working example which would demonstrate their cooperation.
My use case: I’m doing OLAP processing of data stored in Cassandra with Spark. I need to deduplicate data with Cassandra’s upserts, so other (more-suitable) storages like HDFS + parquet, ORC didn’t seem like an option. What I’d like to achieve: speed-up spark’s data ingestion from Cassandra. Is it possible to query data from Cassandra in Arrow format ?