Sorry for the noise, folks! I understand that reducing the number of
partitions works around the issue (at the scale I'm working at, anyway) --
as I mentioned in my initial email -- and I understand the root cause. I'm
not looking for advice on how to resolve my issue. I'm just pointing out
that th
Are you using Spark 2.3 or above?
See the documentation:
https://spark.apache.org/docs/latest/running-on-kubernetes.html
I looks like you do not need:
--conf spark.kubernetes.driver.podTemplateFile='/spark-pod-template.yaml' \
--conf spark.kubernetes.executor.podTemplateFile='/spark-pod-template.
I don't know if it all works but some work was done to make cluster manager
pluggable, see SPARK-13904.
Tom
On Wednesday, November 6, 2019, 07:22:59 PM CST, Klaus Ma
wrote:
Any suggestions?
- Klaus
On Mon, Nov 4, 2019 at 5:04 PM Klaus Ma wrote:
Hi team,
AFAIK, we built k8s/yarn/mes
Basically, the driver tracks partitions and sends it over to
executors, so what it's trying to do is to serialize and compress the
map but because it's so big, it goes over 2GiB and that's Java's limit
on the max size of byte arrays, so the whole thing drops.
The size of data doesn't matter here m
File system is HDFS. Executors are 2 cores, 14GB RAM. But I don't think
either of these relate to the problem -- this is a memory allocation issue
on the driver side, and happens in an intermediate stage that has no HDFS
read/write.
On Fri, Nov 8, 2019 at 10:01 AM Spico Florin wrote:
> Hi!
> Wha
Hi!
What file system are you using: EMRFS or HDFS?
Also what memory are you using for the reducer ?
On Thu, Nov 7, 2019 at 8:37 PM abeboparebop wrote:
> I ran into the same issue processing 20TB of data, with 200k tasks on both
> the map and reduce sides. Reducing to 100k tasks each resolved the