Hello,
I have my data stored in parquet file format. My data Is already partitioned by
dates and keyNow I want my data in each file to be sorted by a new Code column.
date1 -> key1
-> paqfile1
->paqfile2
->key2
->paqfile1
->paqfile2
date2 -> key1 -> paqfile1
->paqfile2
->key2
->paqfile1
->paqfile2
df.sort("code").write.mode(SaveMode.Append).format("parquet").save("/apps/spark/logs")
I am doing some thing like this and assuming my current partitioning will still
be respected and data in my parquet file will be sorted by codes. can you
please let me know if that will be the casE?
can i still expect the same partitioning or do i have to partition again?
RegardsShiv