Re: optimal way to store historical data

2020-06-26 Thread anthony . abate
Also, let me clarify so there is no confusion - There should be no problem creating static / read only arrow data files with a 'date to batch' index in the manner i described. The problem I am referring to only becomes an issue if you need to append a new batch on a daily basis -Anthony On Fri

Re: optimal way to store historical data

2020-06-26 Thread anthony . abate
+1 to this.. There is a logical way to do this now - If you create a batch per day you can maintain a separate arrow file (an index) to map the date to batch.. We do this for indexing via other keys, and I can say it works well for 'large' files - 25gb+. I think unfortunately, doing this via the c

Re: optimal way to store historical data

2020-06-26 Thread Dachuan Zhao
+1 Is the dataset the model for that? On Fri, Jun 26, 2020 at 11:42 AM Kirill Lykov wrote: > Hi, > > I wonder what is the best way to represent time series in the arrow. > Maybe someone did a research already about different ways of > representing these data? Or there is a ready-to-use solution

optimal way to store historical data

2020-06-26 Thread Kirill Lykov
Hi, I wonder what is the best way to represent time series in the arrow. Maybe someone did a research already about different ways of representing these data? Or there is a ready-to-use solution inside the library. Basically, I need a third dimension to the table which is time. One of the solutio