Hi,

I think what you need is to have a long running Spark cluster to which you can submit jobs dynamically.

For SQL, you can start Spark's HiveServer2: https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
This will start a long running Spark cluster with a fixed configuration (executors, cores etc) and allows Spark to act more like a regular database. Then you can create jdbc:hive2:// JDBC connections from your app and run SQL queries/DDLs.

For other components (or even SQL), you can start a Spark jobserver: https://github.com/spark-jobserver/spark-jobserver
This will again start a long running Spark cluster. It also allows you create new SparkContexts on-the-fly though that should not be done from a web app rather configured separately by admin if required. It will require you to implement your job as a SparkJob/SparkSessionJob that will be provided pre-created SparkContext/SparkSession, and these take parameters that can be read dynamically in your implementation. You register your classes in jars separately before-hand. Then you can call those methods using REST API from your application providing it the required parameters like a remote procedure call.

Or you can try SnappyData that provides both of these (and much more) out of the box.

Regards,
Sumedh Wale
SnappyData (http://www.snappydata.io)
Documentation Download

On 02/11/18 11:22, 崔苗(数据与人工智能产品开发部) wrote:

then how about spark sql and spark MLlib , we use them at most time
Please, read about Spark Streaming or Spark Structured Streaming. Your web application can easily communicate through some API and you won’t have the overhead of start a new spark job, which is pretty heavy.

On Thu, Nov 1, 2018 at 23:01 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote:

Hi,
we want to execute spark code with out submit application.jar,like this code:

public static void main(String args[]) throws Exception{
        SparkSession spark = SparkSession
                .builder()
                .master("local[*]")
                .appName("spark test")
                .getOrCreate();
      
        Dataset<Row> testData = spark.read().csv(".\\src\\main\\java\\Resources\\no_schema_iris.scv");
        testData.printSchema();
        testData.show();
    }

the above code can work well with idea , do not need to generate jar file and submit , but if we replace master("local[*]") with master("yarn") , it can't work , so is there a way to use cluster sparkSession like local sparkSession ?  we need to dynamically execute spark code in web server according to the different request ,  such as filter request will call dataset.filter() , so there is no application.jar to submit .
 
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org
--

--
Daniel de Oliveira Mantovani
Perl Evangelist/Data Hacker
+1 786 459 1341
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to