Let’s say that I have a spark dataframe as 3 columns:
id, name, age.

When I save it into HDFS/S3, it saves as:
(where I have used “partitionBy(id, name)”)

<root-dir>/id=1/name=Alex/<filename-1>.parquet
<root-dir>/id=2/name=Bob/<filename-2>.parquet

If I want not to include “id=” and “name=” in
directory structures, what should I do

Therefore I want my final output to be:

<root-dir>/1/Alex/<filename-1>.parquet
<root-dir>/2/Bob/<filename-2>.parquet

Thanks,
M. Parsian



Reply via email to