To answer my own question... I didn't realize that I was responsible for
telling Spark how much parallelism I wanted for my job. I figured that
between Spark and Yarn they'd figure it out for themselves.
Adding --executor-memory 3G --num-executors 24 to my spark-submit command
took the query time
Hi,
I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat.
Not sure if I should be asking on the Spark list or the Accumulo list, but
I'll try here. The problem is that the workload to process SQL queries
doesn't seem to be distributed across my cluster very well.
My Spark SQL