i'm working on a project where big financial data needs to be loaded stored and manipulated. the data is stored as parquet. my initial version had arrow just load the parquet data and i used the basic unorderedmap but this limited me to only one data type. i found i could make my database more generic with arrow and its performance benefits. unfortunately my team is mostly filled with python dev, so i decided to write a cleaner interface over arrow, and using interfaces closer to panda. This enabled us to use fewer lines of code as well, and still enjoy the benefit. i will write a blog post later, i was mostly looking for other developers looking to collaborate, or who may need this as well. not necessarily add it to the main library, but i'm not opposed to that. I also implemented some custom kernels like covariance correlation, cumprod, shift, pctchange.
On Sun, Jan 22, 2023 at 1:56 AM Benson Muite <benson_mu...@emailplus.org> wrote: > On 1/22/23 11:41, Adesola Adedewe wrote: > > The project was initially meant to provide a simpler interface over arrow > > apache so pretty much what was done with the python api, but it has > > evolved to be more than that ,with indexing and other panda operations > > implemented like reindex, resample, concat etc. I currently have it good > > enough for my project but I think it has potential to also open the door > > for more developers to use arrow for their projects. please take a look. > > > > Thanks. What problem did this solve for you? How did you utilize it > for your project? Maybe you could contribute a blog post to Arrow > describing the end use case and the motivation for a C++ dataframe > interface? > >