Hey all,

Wanted to share a project we've been working on at Query.Farm: vgi-rpc, an 
open-source RPC framework built on Apache Arrow IPC.

It started with a dream, can I build services using Arrow without being locked 
into the gRPC ecosystem? Last April, Matt Topol and I were having dinner after 
PyData Charlottesville, and he said something that really stuck with me: "Just 
use Arrow." So I did exactly that with vgi-rpc. There is no other serialization 
except Arrow (no protobuf, not msgpack, just Arrow).

The elevator pitch: define RPC services as plain Python `Protocol` classes, and 
get typed proxies with full IDE autocompletion. No `.proto` files, no codegen, 
no protobuf dependency. Your type annotations *are* the schema. The framework 
infers Arrow schemas from them automatically.  Being Arrow means I can make 
this RPC framework work in any language that supports Arrow IPC, but I chose to 
get started with Python.

vgi-rpc takes a different approach compared to Flight. It's aimed at people who 
want typed, ergonomic RPC without the complexity or lock-in of gRPC, but with 
all the benefits of Arrow. And being Arrow, it's really useful if your RPC 
responses have variable length responses, or as I say, "stream."

- **Custom methods** - Flight gives you a fixed API (`do_get`, `do_put`, 
`do_action`). vgi-rpc lets you define whatever methods you want, with whatever 
signatures you want. Your proxy has real method names and type-checked 
arguments.

- **Transport flexibility** - Flight is gRPC-only, and as Arrow developers 
we've discussed the complexity and restrictions that brings. vgi-rpc runs over 
in-process pipes, subprocess stdin/stdout, Unix domain sockets, shared memory, 
or HTTP. Same service code, different transport. Pipes are great for testing, 
shared memory gives you zero-copy between co-located processes. Shared memory 
still requires a bit more work in PyArrow to be as fast as the memory bandwidth 
of your machine. It depends on these PRs 
(https://github.com/apache/arrow/pull/49262, 
https://github.com/apache/arrow/pull/49286). On my MacBook Air M3 I got it up 
to 29 GB/s.

- **Shared memory transport** - `ShmPipeTransport` lets two processes share 
Arrow record batches by passing a pointer over a pipe instead of serializing 
the whole thing. Flight has no equivalent right now. If you're running a 
pipeline of worker processes on the same machine, this is a significant win.  
It almost completely eliminates the overhead of being in an external process.

- **No gRPC dependency** - Flight pulls in gRPC and protobuf (including C++ 
compilation). vgi-rpc's core dependency is just PyArrow. The HTTP transport 
uses Falcon/httpx, which are pure Python. When vgi-rpc gets ported to other 
languages it will just depend on the Arrow implementation in that language.

- **Simpler streaming** - vgi-rpc supports producer streams (server pushes 
batches) and lockstep exchange (request-response ping-pong). It deliberately 
skips full-duplex concurrent streaming. Lockstep is easier to reason about and 
covers most use cases.

**Where Flight still wins:**

- Multi-language ecosystem (if you need Java/Go/C++ clients), but I have 
vgi-rpc implementations in C++, TypeScript, Swift, and Go in progress. Stay 
tuned.
- Concurrent bidirectional streaming (gRPC-style full duplex)

**Other things worth mentioning:**

- Transparent large batch externalization to S3/GCS. Batches above a threshold 
get offloaded to cloud storage automatically, so you can host behind servers 
with request size limits (looking at Cloudflare Workers, AWS Lambda, and Google 
Cloud Run).
- Pluggable auth (JWT, API key, whatever) on the HTTP transport
- Built-in introspection endpoint for service discovery
- OpenTelemetry instrumentation
- Strict mypy typing throughout
- Python 3.13+, Apache 2.0 licensed

Docs: https://vgi-rpc-python.query.farm/
PyPI: https://pypi.org/project/vgi-rpc/
GitHub: https://github.com/Query-farm/vgi-rpc-python

Happy to answer any questions or hear feedback. There is more to the VGI story, 
this is just the start at the bottom of the layers of what's coming.

Cheers,

Rusty
https://query.farm

Reply via email to