Hi Yash, Currently, there is the `parquet.write_to_dataset` function for something like that. But that requires to specify a column by which to split the single pyarrow Table. To just split one table in regular chunks to write to multiple files in a single directory, I don't think we have an automatic function for that (you could slice the table in a loop and write each subset with `write_table`).
You can also control the row group size (partitioning within a single Parquet file), using the row_group_size argument of `write_table`. Best, Joris On Wed, 8 Jul 2020 at 20:44, Yash Ganthe <yas...@gmail.com> wrote: > > Hi, > > parquet_writer.write_table(table) > > This line writes a single file. > The documentation says: > This creates a single Parquet file. In practice, a Parquet dataset may > consist of many files in many directories. We can read a single file back > with read_table: > > Is there a way for PyArrow to create a parquet file in the form of a > directory with multiple part files in it such as : > > ls -lrt permit-inspections-recent.parquet > ... 14:53 part-00001-bd5d902d-fac9-4e03-b63e-6a8dfc4060b6.snappy.parquet > ... 14:53 part-00000-bd5d902d-fac9-4e03-b63e-6a8dfc4060b6.snappy.parquet > > Regards, > Yash