Hi all, I could understand the use of Arrow in our projects to have inter-operability as well as faster access. I have couple of questions on how we can use for the following usecase and whether is it a good way of usage,
1. Will the Spark execution be faster when I use joins on DF with Arrow compared to normal Parquet format ? Due to serialization and deserialization shuffling cost is lesser ? Is it? 2. If I have a use case of running aggregate queries on a very huge table (say 10TB) containing say few dimensions and very few metrics - Is it good to use Arrow as intermediate caching layer for interactive queries ? (Low latency queries) Note: Dremio contains this by default - should I explore it or Impala or Drill for this use case ? Thanks, Subash