Hi, I'm trying to control the size and/or count of spark output. Here is my code. I expect to get 5 files but I get dozens of small files. Why?
dataset
.repartition(5)
.sort("long_repeated_string_in_this_column") // should be better compressed
with snappy
.write
.parquet(outputPath)
