Hi,
Both jobs use spark.dynamicAllocation.enabled so there's no need to change
the number of executors. There are 702 executors in the Dataproc cluster so
this is not the problem.
About number of partitions - this I didn't change and it's still 400. While
writing this now, I am realising that I ha
Hi, Han.
Thanks for trying out the push based shuffle.
Please make sure you configure both the Spark client side configuration and
server side configurations.
The client side configuration looks good, and from the error message, looks
like you are missing the server side configurations.
Please refe
+CC zhouye...@gmail.com
On Mon, May 23, 2022 at 7:11 AM Han Altae-Tran wrote:
> Hi,
>
> First of all, I am very thankful for all of the amazing work that goes
> into this project! It has opened up so many doors for me! I am a long
> time Spark user, and was very excited to start working with th
Hello All,
I've a Structured Streaming job on GCP Dataproc, and i'm trying to pass
multiple packages (kafka, mongoDB) to the dataproc submit command, and
that is not working.
Command that is working (when i add single dependency eg. Kafka) :
```
gcloud dataproc jobs submit pyspark main.py \
--
Hi Ori,
A single task for the final step can result from various scenarios like an
aggregate operation that results in only 1 value (e.g count) or a key based
aggregate with only 1 key for example. There could be other scenarios as
well. However, that would be the case in both EMR and Dataproc if
Hi,
in the spirit of not fitting the solution to the problem, would it not be
better to first create a producer for your job and use a broker like Kafka
or Kinesis or Pulsar?
Regards,
Gourav Sengupta
On Sat, May 21, 2022 at 3:46 PM Rohit Pant wrote:
> Hi all,
>
> I am trying to implement a cu