Hi, we are seeing this error: Job aborted due to stage failure: Task 0 in stage 1.0 failed 8...Reason: Container from a bad node: container_xxx on host: dev-yyy Exit status: 134
This post suggests this has to do with blacklisted nodes: https://stackoverflow.com/questions/65889696/spark-exit-status-134-what-does-it-mean but in the spark ui, all executors say blacklisted=0. ALso on that same cluster, many other jobs are running happilty so I dont believe the nodes are “corrupted”. Thanks for input!