Re: how to use cluster sparkSession like localSession

Sumedh Wale Sun, 04 Nov 2018 21:17:13 -0800

Hi,

I think what you need is to have a long running Spark cluster to which you can submit jobs dynamically.

For SQL, you can start Spark's HiveServer2: https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
This will start a long running Spark cluster with a fixed configuration (executors, cores etc) and allows Spark to act more like a regular database. Then you can create jdbc:hive2:// JDBC connections from your app and run SQL queries/DDLs.

For other components (or even SQL), you can start a Spark jobserver: https://github.com/spark-jobserver/spark-jobserver
This will again start a long running Spark cluster. It also allows you create new SparkContexts on-the-fly though that should not be done from a web app rather configured separately by admin if required. It will require you to implement your job as a SparkJob/SparkSessionJob that will be provided pre-created SparkContext/SparkSession, and these take parameters that can be read dynamically in your implementation. You register your classes in jars separately before-hand. Then you can call those methods using REST API from your application providing it the required parameters like a remote procedure call.

Or you can try SnappyData that provides both of these (and much more) out of the box.

Regards,
Sumedh Wale
SnappyData (http://www.snappydata.io)
Documentation Download

On 02/11/18 11:22, 崔苗(数据与人工智能产品开发部) wrote:

then how about spark sql and spark MLlib , we use them at most time

0049003208

0049003...@znv.com

签名由网易邮箱大师定制

On 11/2/2018 11:58，Daniel de Oliveira Mantovani<daniel.oliveira.mantov...@gmail.com> wrote：

Please, read about Spark Streaming or Spark Structured Streaming. Your web application can easily communicate through some API and you won’t have the overhead of start a new spark job, which is pretty heavy.

On Thu, Nov 1, 2018 at 23:01 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote:

Hi，

we want to execute spark code with out submit application.jar,like this code:

public static void main(String args[]) throws Exception{

SparkSession spark = SparkSession

.builder()

.master("local[*]")

.appName("spark test")

.getOrCreate();

Dataset<Row> testData = spark.read().csv(".\\src\\main\\java\\Resources\\no_schema_iris.scv");

testData.printSchema();

testData.show();

}

the above code can work well with idea , do not need to generate jar file and submit , but if we replace master("local[*]") with master("yarn") , it can't work , so is there a way to use cluster sparkSession like local sparkSession ? we need to dynamically execute spark code in web server according to the different request , such as filter request will call dataset.filter() , so there is no application.jar to submit .

0049003208

0049003...@znv.com

签名由网易邮箱大师定制

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--

--

Daniel de Oliveira Mantovani

Perl Evangelist/Data Hacker

+1 786 459 1341

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: how to use cluster sparkSession like localSession

Reply via email to