The relevant config is spark.sql.shuffle.partitions. Note that once you
start a query, this number is fixed. The config will only affect queries
starting from an empty checkpoint.
On Wed, Nov 8, 2017 at 7:34 AM, Teemu Heikkilä wrote:
> I have spark structured streaming job and I'm crunching th
I have spark structured streaming job and I'm crunching through few terabytes
of data.
I'm using file stream reader and it works flawlessly, I can adjust the
partitioning of that with spark.default.parallelism
However I'm doing sessionization for the data after loading it and I'm
currently lo