Re: Benchmarking with multiple users in Spark

Chris Fregly Fri, 08 Jan 2016 23:55:02 -0800

Have you tried Spark 1.6's multi-session mode for the Hive Thrift Server?

It's turned on by default in 1.6:


https://github.com/apache/spark/blob/0d42292f6a2dbe626e8f6a50e6c61dd79533f235/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L194

You can also try to increase the async thread pool size for the hive thrift
server:

https://github.com/apache/spark/blob/d4a5e6f719079639ffd38470f4d8d1f6fde3228d/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L49

Curious if either of these works for you.


On Tue, Dec 15, 2015 at 8:27 PM, Rajesh Balamohan <
rajesh.balamo...@gmail.com> wrote:

> Hi,
>
> I am currently using spark 1.5.2 and I have been able to run benchmarks in
> spark (SQL specifically) in single user mode.  For benchmarking with
> multiple users, I have tried some of the following approaches, but each has
> its own disadvantage
>
>    1. Start thrift server in Spark.
>       - Execute queries via JDBC from Jmeter. (Disadvantage is that, it
>       is not possible to execute custom code to load tables as DataFrames)
>    2. Start custom thrift server in Spark. Custom thrift server would
>    create HiveContext and could load all relevant tables as temp tables (with
>    DF).  Later it could start thrift-server via
>    "HiveThriftServer2.startWithContext(hiveContext); "
>       - Execute queries in Jmeter via JDBC. (Disadvantage is that, it can
>       simulate single user. When multiple threads submit the queries, they are
>       executed in serial fashion)
>       - Even if number of executors is increased, it does not solve this
>       problem.  With more executors, the response times of small queries tend 
> to
>       be higher with multiple runs (may be consecutive executions are 
> happening
>       in different executors where the data wasn’t cached).
>    3. Create multiple SparkContexts when Jmeter initializes the
>    benchmark. This is more like a pool of SparkContexts and every user can
>    make use of different SparkContext.
>       - This leads to SPARK-2243
>       <https://issues.apache.org/jira/browse/SPARK-2243> and
>       "spark.driver.allowMultipleContexts=true” is not helpful in this case.
>    4. Another option could be to just launch multiple spark-shells to
>    simulate multiple users with dynamic resource allocation enabled.  I
>    haven’t tried this yet.
>
> Are there any standard approaches for benchmarking with multiple users in
> Spark? Any pointers on this would be helpful.
>
> ~Rajesh.B
>



-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Re: Benchmarking with multiple users in Spark

Reply via email to