Re: optimal way to store historical data

anthony . abate Fri, 26 Jun 2020 08:59:39 -0700

+1 to this..

There is a logical way to do this now - If you create a batch per day you
can maintain a separate arrow file (an index) to map the date to batch.. We
do this for indexing via other keys, and I can say it works well for
'large' files - 25gb+. I think unfortunately, doing this via the current
language level API requires a lot of recreating a copy files for each new
batch due to the immutability

However, I know for a fact the file based format could support this 'very
easily' it would just require the footer to be replaced and overwritten
with the next batch and a new block inserted with correct offsets into a
new footer after the new batch.   Using the C++ libraries (or your own) one
could do this manually, but it might be useful to build it into the
higher level APIs.  If there exists a way to do this now that doesnt
require hacking the files manually, please let me know!

On Fri, Jun 26, 2020 at 11:49 AM Dachuan Zhao <tczhaodach...@gmail.com>
wrote:

> +1
> Is the dataset the model for that?
>
> On Fri, Jun 26, 2020 at 11:42 AM Kirill Lykov <lykov.kir...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I wonder  what is the best way to represent time series in the arrow.
> > Maybe someone did a research already about different ways of
> > representing these data? Or there is a ready-to-use solution inside
> > the library. Basically, I need a third dimension to the table which is
> > time. One of the solutions is to have a new table for each date, but
> > there are many other ways as well.
> >
> > --
> > Best regards,
> > Kirill Lykov
> >
> --
> Sent from iPhone
>

Re: optimal way to store historical data

Reply via email to