Hi, I just wanted to introduce myself to the group before I start asking lots of questions. I'm a software engineer mostly working with Scala/Spark/Kudu/Parquet in my day job and in my spare time I have been working on a POC of a distributed data platform implemented in Rust. The project is called DataFusion (https://www.datafusion.rs/).
The project is very early and the implementation is currently very simple row-based processing but the performance is already quite exciting to me (current test case is 4x faster than Apache Spark). I have decided that I should now concentrate on making Apache Arrow the native memory format so that I can implement more efficient data processing and make it easier in the future to be able to integrate with things like Kudu and Parquet. It's also just a great way for me to learn about columnar-processing. I'm just in the process of getting Arrow compiling and reading the docs. I'll be back soon with questions I'm sure. Thanks, Andy.