Re: Spark + AccumuloInputFormat

2014-09-10 Thread Russ Weeks
To answer my own question... I didn't realize that I was responsible for telling Spark how much parallelism I wanted for my job. I figured that between Spark and Yarn they'd figure it out for themselves. Adding --executor-memory 3G --num-executors 24 to my spark-submit command took the query time

Spark + AccumuloInputFormat

2014-09-09 Thread Russ Weeks
Hi, I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. Not sure if I should be asking on the Spark list or the Accumulo list, but I'll try here. The problem is that the workload to process SQL queries doesn't seem to be distributed across my cluster very well. My Spark SQL