OK, awesome! Thanks for the reply.
On Mon, Jul 10, 2017 at 1:42 PM, Uwe L. Korn <uw...@xhochy.com> wrote: > Hello Alexey, > > you discovered a known bug in 0.4.1. If a column is only made up of None > objects, then writing to Parquet fails. This is fixed upstream and will > be included in the upcoming 0.5.0 release. > > Uwe > > > On Sat, Jul 8, 2017, at 04:32 AM, Alexey Strokach wrote: > > I am running into a problem converting a csv file into a parquet file in > > chunks, where one of the string columns is null for the first several > > million rows. > > > > Self-contained dummy example: > > > > csv_file = '/tmp/df.csv' > > parquet_file = '/tmp/df.parquet' > > > > df = pd.DataFrame([np.nan] * 3 + ['hello'], columns=['a']) > > df.to_csv(csv_file, index=False, na_rep='.') > > display(df) > > > > for i, chunk in enumerate(pd.read_csv(csv_file, chunksize=2, > > na_values=['.'], dtype={'a': str})): > > print(i) > > display(chunk) > > if i == 0: > > parquet_schema = pa.Table.from_pandas(chunk).schema > > parquet_writer = pq.ParquetWriter(parquet_file, > > parquet_schema, compression='snappy') > > table = pa.Table.from_pandas(chunk, schema=parquet_schema) > > parquet_writer.write_table(table) > > > > Any suggestions would be much appreciated. > > > > Running pyarrow=0.4.1=np112py36_1 installed using conda on Linux Mint > > 18.1 > > > > And thanks a lot for developing pyarrow.parquet! > > Alexey > > >