Re: How to control count / size of output files for

Gourav Sengupta Thu, 25 Feb 2021 07:33:17 -0800

Hi Ivan,

sorry but it always helps to know the version of SPARK you are using, its
environment, and the format that you are writing out your files to, and any
other details if possible.



Regards,
Gourav Sengupta

On Wed, Feb 24, 2021 at 3:43 PM Ivan Petrov <capacyt...@gmail.com> wrote:

> Hi, I'm trying to control the size and/or count of spark output.
>
> Here is my code. I expect to get 5 files  but I get dozens of small files.
> Why?
>
> dataset
> .repartition(5)
> .sort("long_repeated_string_in_this_column") // should be better
> compressed with snappy
> .write
> .parquet(outputPath)
>

Re: How to control count / size of output files for

Reply via email to