Hi Ivan, sorry but it always helps to know the version of SPARK you are using, its environment, and the format that you are writing out your files to, and any other details if possible.
Regards, Gourav Sengupta On Wed, Feb 24, 2021 at 3:43 PM Ivan Petrov <capacyt...@gmail.com> wrote: > Hi, I'm trying to control the size and/or count of spark output. > > Here is my code. I expect to get 5 files but I get dozens of small files. > Why? > > dataset > .repartition(5) > .sort("long_repeated_string_in_this_column") // should be better > compressed with snappy > .write > .parquet(outputPath) >