We are trying to benchmark TPC-H (scale factor 1) on a 13-node Raspberry Pi 3B+ cluster (1 master, 12 workers). Each node has 1GB of RAM and a quad-core processor, running Ubuntu Server 18.04. The cluster is using the Spark standalone scheduler with the *.tbl files from TPCH’s dbgen tool stored in HDFS.
We are experiencing several failures when trying to run queries. Jobs fail unpredictably, usually with one or many “DEAD/LOST” nodes displaying in the web UI. It appears that one or more nodes “hang” during query execution and become unreachable/timeout. We have included our configuration parameters as well as the driver program below. Any recommendations would be greatly appreciated ------------------------------------------- ------------------------------------------- Driver: ------------------------------------------- -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org