Hi Ivan,

sorry but it always helps to know the version of SPARK you are using, its
environment, and the format that you are writing out your files to, and any
other details if possible.


Regards,
Gourav Sengupta

On Wed, Feb 24, 2021 at 3:43 PM Ivan Petrov <capacyt...@gmail.com> wrote:

> Hi, I'm trying to control the size and/or count of spark output.
>
> Here is my code. I expect to get 5 files  but I get dozens of small files.
> Why?
>
> dataset
> .repartition(5)
> .sort("long_repeated_string_in_this_column") // should be better
> compressed with snappy
> .write
> .parquet(outputPath)
>

Reply via email to