Also, let me clarify so there is no confusion - There should be no problem creating static / read only arrow data files with a 'date to batch' index in the manner i described. The problem I am referring to only becomes an issue if you need to append a new batch on a daily basis
-Anthony On Fri, Jun 26, 2020 at 11:58 AM <anthony.ab...@gmail.com> wrote: > +1 to this.. > > There is a logical way to do this now - If you create a batch per day you > can maintain a separate arrow file (an index) to map the date to batch.. We > do this for indexing via other keys, and I can say it works well for > 'large' files - 25gb+. I think unfortunately, doing this via the current > language level API requires a lot of recreating a copy files for each new > batch due to the immutability > > However, I know for a fact the file based format could support this 'very > easily' it would just require the footer to be replaced and overwritten > with the next batch and a new block inserted with correct offsets into a > new footer after the new batch. Using the C++ libraries (or your own) one > could do this manually, but it might be useful to build it into the > higher level APIs. If there exists a way to do this now that doesnt > require hacking the files manually, please let me know! > > On Fri, Jun 26, 2020 at 11:49 AM Dachuan Zhao <tczhaodach...@gmail.com> > wrote: > >> +1 >> Is the dataset the model for that? >> >> On Fri, Jun 26, 2020 at 11:42 AM Kirill Lykov <lykov.kir...@gmail.com> >> wrote: >> >> > Hi, >> > >> > I wonder what is the best way to represent time series in the arrow. >> > Maybe someone did a research already about different ways of >> > representing these data? Or there is a ready-to-use solution inside >> > the library. Basically, I need a third dimension to the table which is >> > time. One of the solutions is to have a new table for each date, but >> > there are many other ways as well. >> > >> > -- >> > Best regards, >> > Kirill Lykov >> > >> -- >> Sent from iPhone >> >