+1 to this.. There is a logical way to do this now - If you create a batch per day you can maintain a separate arrow file (an index) to map the date to batch.. We do this for indexing via other keys, and I can say it works well for 'large' files - 25gb+. I think unfortunately, doing this via the current language level API requires a lot of recreating a copy files for each new batch due to the immutability
However, I know for a fact the file based format could support this 'very easily' it would just require the footer to be replaced and overwritten with the next batch and a new block inserted with correct offsets into a new footer after the new batch. Using the C++ libraries (or your own) one could do this manually, but it might be useful to build it into the higher level APIs. If there exists a way to do this now that doesnt require hacking the files manually, please let me know! On Fri, Jun 26, 2020 at 11:49 AM Dachuan Zhao <tczhaodach...@gmail.com> wrote: > +1 > Is the dataset the model for that? > > On Fri, Jun 26, 2020 at 11:42 AM Kirill Lykov <lykov.kir...@gmail.com> > wrote: > > > Hi, > > > > I wonder what is the best way to represent time series in the arrow. > > Maybe someone did a research already about different ways of > > representing these data? Or there is a ready-to-use solution inside > > the library. Basically, I need a third dimension to the table which is > > time. One of the solutions is to have a new table for each date, but > > there are many other ways as well. > > > > -- > > Best regards, > > Kirill Lykov > > > -- > Sent from iPhone >