Also, let me clarify so there is no confusion  - There should be no problem
creating static / read only arrow data files with a 'date to batch' index
in the manner i described.  The problem I am referring to only becomes an
issue if you need to append a new batch on a daily basis

-Anthony


On Fri, Jun 26, 2020 at 11:58 AM <anthony.ab...@gmail.com> wrote:

> +1 to this..
>
> There is a logical way to do this now - If you create a batch per day you
> can maintain a separate arrow file (an index) to map the date to batch.. We
> do this for indexing via other keys, and I can say it works well for
> 'large' files - 25gb+. I think unfortunately, doing this via the current
> language level API requires a lot of recreating a copy files for each new
> batch due to the immutability
>
> However, I know for a fact the file based format could support this 'very
> easily' it would just require the footer to be replaced and overwritten
> with the next batch and a new block inserted with correct offsets into a
> new footer after the new batch.   Using the C++ libraries (or your own) one
> could do this manually, but it might be useful to build it into the
> higher level APIs.  If there exists a way to do this now that doesnt
> require hacking the files manually, please let me know!
>
> On Fri, Jun 26, 2020 at 11:49 AM Dachuan Zhao <tczhaodach...@gmail.com>
> wrote:
>
>> +1
>> Is the dataset the model for that?
>>
>> On Fri, Jun 26, 2020 at 11:42 AM Kirill Lykov <lykov.kir...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I wonder  what is the best way to represent time series in the arrow.
>> > Maybe someone did a research already about different ways of
>> > representing these data? Or there is a ready-to-use solution inside
>> > the library. Basically, I need a third dimension to the table which is
>> > time. One of the solutions is to have a new table for each date, but
>> > there are many other ways as well.
>> >
>> > --
>> > Best regards,
>> > Kirill Lykov
>> >
>> --
>> Sent from iPhone
>>
>

Reply via email to