Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Bogdan Klichuk
Nice, hank you for the approximate timeline! On Mon, Jun 17, 2019 at 1:15 AM Micah Kornfield wrote: > Hi Bogdan, > >> Alright, so speaking of serialization of pyarrow.Table vs Feather, if >> they are pretty much the same, but arrow alone shouldn't >> be used to long-storage, is this also the cas

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Micah Kornfield
Hi Bogdan, > Alright, so speaking of serialization of pyarrow.Table vs Feather, if they > are pretty much the same, but arrow alone shouldn't > be used to long-storage, is this also the case for Feather or can it be a > valid option for my case? Per Wes's e-mail on similar thread[1], once we rea

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Bogdan Klichuk
Hello. Thanks for the reply! On Sun, Jun 16, 2019 at 8:40 AM Wes McKinney wrote: > hi Micah, > > On Sun, Jun 16, 2019 at 12:16 AM Micah Kornfield > wrote: > > > > Hi Bogdan, > > I'm not an expert here but answers based on my understanding are below: > > > > 1) Is there something I'm missing i

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-16 Thread Wes McKinney
hi Micah, On Sun, Jun 16, 2019 at 12:16 AM Micah Kornfield wrote: > > Hi Bogdan, > I'm not an expert here but answers based on my understanding are below: > > 1) Is there something I'm missing in understanding difference between > > serializing dataframe directly using PyArrow and serializing > >

Re: Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-15 Thread Micah Kornfield
Hi Bogdan, I'm not an expert here but answers based on my understanding are below: 1) Is there something I'm missing in understanding difference between > serializing dataframe directly using PyArrow and serializing > `pyarrow.Table`, Table shines in case dataframes mostly consists of > strings, w

Using pyarrow.Table for long-term storage of pandas DataFrames

2019-06-12 Thread Bogdan Klichuk
Trying to come up with a solution for quick Pandas dataframes serialization and long-storage. Dataframe content is tabular, but provided by user, can be arbitrary, so might both completely text columns and completely numeric/boolean columns. ## Main goals are: * Serialize dataframe as quickly as