Re: New Pandas-Apache repo

Adesola Adedewe Sun, 22 Jan 2023 02:16:25 -0800

i'm working on a project where big financial data needs to be loaded stored
and manipulated. the data is stored as parquet. my initial version had
arrow just load the parquet data and i used the basic unorderedmap but this
limited me to only one data type. i found i could make my database more
generic with arrow and its performance benefits. unfortunately my team is
mostly filled with python dev, so i decided to write a cleaner interface
over arrow, and using interfaces closer to panda. This enabled us to use
fewer lines of code as well, and still enjoy the benefit. i will write a
blog post later, i was mostly looking for other developers looking to
collaborate, or who may need this as well. not necessarily add it to the
main library, but i'm not opposed to that. I also implemented some
custom kernels like covariance correlation, cumprod, shift, pctchange.


On Sun, Jan 22, 2023 at 1:56 AM Benson Muite <benson_mu...@emailplus.org>
wrote:

> On 1/22/23 11:41, Adesola Adedewe wrote:
> > The project was initially meant to provide a simpler interface over arrow
> > apache so pretty much what was done with the python api, but it has
> > evolved to be more than that ,with indexing and other panda operations
> > implemented like reindex, resample, concat etc. I currently have it good
> > enough for my project but I think it has potential to also open the door
> > for more developers to use arrow for their projects. please take a look.
> >
>
> Thanks.  What problem did this solve for you?  How did you utilize it
> for your project?  Maybe you could contribute a blog post to Arrow
> describing the end use case and the motivation for a C++ dataframe
> interface?
>
>

Re: New Pandas-Apache repo

Reply via email to