Hi, (NB: I first floated this question in the arrow-rust slack channel and Jorge Leitao suggested I should ask here.)
I’m cranking up a project to provide functionality based on: parquet/arrow/flight implemented in rust. The primary goals of the project are to provide a mechanism for storing/retrieving large quantities of column oriented data across different types of storage mechanism, (S3, filesystem, etc..). Initially, at least, the flight/arrow/parquet stack looks to be a great fit for what I’m doing. I’ve done some prototyping and so far I’ve made good progress. I have a simple flight service (written in rust: arrow 4.0.0 stack) which is happy to send/receive data to/from a very simple flight client (written in python). I’ve encountered a few rough edges and before proceeding further I thought I’d see what other people think of the idea of using flight/arrow to provide a persistence service (parquet) for large quantities of column oriented data. One of my questions is about the use of flight. Flight seems to be primarily oriented around streams of data (which is cool), but has anyone else considered using that as the basis for a distributed storage framework? do_get would read_parquet/send_arrow parquet data and do_put would receive_arrow/write_parquet it. Or perhaps separate persistence as a new action? Another question is around schema evolution. Any gotchas with this approach. Do I need to think about a separate schema registry and how would I evolve data against that registry? For now, forget about authn/authz issues, I think the handshake mechanism will probably suffice, but if not I can roll extensions using the action mechanism. Has anyone else done anything like this? Does it seem like a reasonable use of the tooling. Any gotchas I should be worrying about? Cheers, Gary