I’ve read lot of nice things about Apache Arrow in-memory columnar format. On 
their homepage they mention Cassandra as a possible storage which could 
interoperate with Arrow. Unfortunately I was not able to find any working 
example which would demonstrate their cooperation.

My use case: I’m doing OLAP processing of data stored in Cassandra with Spark. 
I need to deduplicate data with Cassandra’s upserts, so other (more-suitable) 
storages like HDFS + parquet, ORC didn’t seem like an option.
What I’d like to achieve: speed-up spark’s data ingestion from Cassandra. 

Is it possible to query data from Cassandra in Arrow format ?

Reply via email to