Hi Spencer, Thank you for sharing! >From a quick look, Quivr looks like an interesting project and it is great to see Arrow and Python bindings being used/extended in such a way.
You are definitely encouraged to work on PyArrow and on the features around it. Any kind of contribution is very welcome. Do not hesitate to ping me on Python related issues in case you need a suggestion or a review. Good luck, Alenka On Tue, Oct 3, 2023 at 11:44 PM Spencer Nelson <swnel...@uw.edu> wrote: > Hi all - I'd like to share a library I've been working on for a few months > which is built on top of Arrow. It's called quivr > <https://github.com/spenczar/quivr> (like a bundle of arrows) and it could > be thought of as tools to wrap up PyArrow Tables and extend their > capabilities. > > I work on scientific software. A lot of the initial scientific work is done > in Jupyter notebooks with dataframes. When it's time to build larger > production systems on top of that work, the flexibility of dataframes > becomes a liability. It's hard to write structured code because dataframes > can be so variably typed and permissive. > > But if you try to use normal tools for this (Python objects, lists, > dictionaries), you get crushed with performance issues. I wanted an > array-oriented framework, but with a more structured model than any > dataframe libraries out there. > > So, quivr fills that need. You write a *Table* definition, which > corresponds closely to a pyarrow Table schema. You do that by writing a > Python class, with class attributes signaling the types and names of your > columns. And then you can attach methods to describe computation. > > By using Arrow's struct types, Tables can be composed. You might have a > Table which defines a "Location" - and has sophisticated logic for that > purpose - and reuse that Location within other, higher-order tables. The > compositional approach has really been working extremely well so far in our > work. > > I've written a little blog post > <https://journal.spencerwnelson.com/entries/quivr.html> describing the > motivations and showing it in use, and docs are up too > <https://quivr.readthedocs.io/en/stable/>. quivr is still in a pretty > molten state, so I'm very interested in any feedback or broader interest in > this from anyone who might find it useful. I'd love to work closer with the > Arrow team as well - I have a growing wishlist of features around PyArrow > which I'd be interested in working on. > > Thanks, > Spencer >