Have you tried Spark 1.6's multi-session mode for the Hive Thrift Server? It's turned on by default in 1.6:
https://github.com/apache/spark/blob/0d42292f6a2dbe626e8f6a50e6c61dd79533f235/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L194 You can also try to increase the async thread pool size for the hive thrift server: https://github.com/apache/spark/blob/d4a5e6f719079639ffd38470f4d8d1f6fde3228d/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L49 Curious if either of these works for you. On Tue, Dec 15, 2015 at 8:27 PM, Rajesh Balamohan < rajesh.balamo...@gmail.com> wrote: > Hi, > > I am currently using spark 1.5.2 and I have been able to run benchmarks in > spark (SQL specifically) in single user mode. For benchmarking with > multiple users, I have tried some of the following approaches, but each has > its own disadvantage > > 1. Start thrift server in Spark. > - Execute queries via JDBC from Jmeter. (Disadvantage is that, it > is not possible to execute custom code to load tables as DataFrames) > 2. Start custom thrift server in Spark. Custom thrift server would > create HiveContext and could load all relevant tables as temp tables (with > DF). Later it could start thrift-server via > "HiveThriftServer2.startWithContext(hiveContext); " > - Execute queries in Jmeter via JDBC. (Disadvantage is that, it can > simulate single user. When multiple threads submit the queries, they are > executed in serial fashion) > - Even if number of executors is increased, it does not solve this > problem. With more executors, the response times of small queries tend > to > be higher with multiple runs (may be consecutive executions are > happening > in different executors where the data wasn’t cached). > 3. Create multiple SparkContexts when Jmeter initializes the > benchmark. This is more like a pool of SparkContexts and every user can > make use of different SparkContext. > - This leads to SPARK-2243 > <https://issues.apache.org/jira/browse/SPARK-2243> and > "spark.driver.allowMultipleContexts=true” is not helpful in this case. > 4. Another option could be to just launch multiple spark-shells to > simulate multiple users with dynamic resource allocation enabled. I > haven’t tried this yet. > > Are there any standard approaches for benchmarking with multiple users in > Spark? Any pointers on this would be helpful. > > ~Rajesh.B > -- *Chris Fregly* Principal Data Solutions Engineer IBM Spark Technology Center, San Francisco, CA http://spark.tc | http://advancedspark.com