Re: Plan on Structured Streaming in next major/minor release?

2018-11-02 Thread Jungtaek Lim
My 2 cents, "micro-batch" is the way how Spark handles stream, not a
semantic we are considering. Semantically and ideally, same SQL query
should provide same result between batch and streaming except late events
once the operations in query are supported.

2018년 11월 2일 (금) 오후 3:54, kant kodali 님이 작성:

> If I can add one thing to this list I would say stateless aggregations
> using Raw SQL.
>
> For example: As I read micro-batches from Kafka I want to do say a count
> of that micro batch and spit it out using Raw SQL . (No Count aggregation
> across batches.)
>
>
>
> On Tue, Oct 30, 2018 at 4:55 PM Jungtaek Lim  wrote:
>
>> OK thanks for clarifying. I guess it is one of major features in
>> streaming area and nice to add, but also agree it would require huge
>> investigation.
>>
>> 2018년 10월 31일 (수) 오전 8:06, Michael Armbrust 님이
>> 작성:
>>
>>> Agree. Just curious, could you explain what do you mean by "negation"?
 Does it mean applying retraction on aggregated?

>>>
>>> Yeah exactly.  Our current streaming aggregation assumes that the input
>>> is in append-mode and multiple aggregations break this.
>>>
>>


start spark code without applicatio.jar and submit

2018-11-02 Thread 数据与人工智能产品开发部












Hi,we want to execute spark code with out submit application.jar,like this code:public static void main(String args[]) throws Exception{        SparkSession spark = SparkSession                .builder()                .master("local[*]")                .appName("spark test")                .getOrCreate();              Dataset testData = spark.read().csv(".\\src\\main\\java\\Resources\\no_schema_iris.scv");        testData.printSchema();        testData.show();    }the above code can work well with idea , do not need to generate jar file and submit , but if we replace master("local[*]") with master("yarn") , it can't work , so is there a way to use cluster sparkSession like local sparkSession ?  we need to dynamically execute spark code in web server according to the different request ,  such as filter request will call dataset.filter() , so there is no application.jar to submit .




 










0049003208




0049003...@znv.com








签名由
网易邮箱大师
定制

 





spark history server no application found

2018-11-02 Thread 数据与人工智能产品开发部












Hi,
I successful start the spark history server ,but there is no application to display ,

 




this is the config: spark.eventLog.enabled            true spark.eventLog.dir                hdfs://hadoop-master:9000/user/spark/historylog spark.history.fs.logDirectory     hdfs://hadoop-master:9000/user/spark/historylog spark.eventLog.compress           true spark.yarn.historyServer.address  hadoop-slave1:18080I start the history-server with user : hadoop , so it should have the right to read apploication logs

签名由
网易邮箱大师
定制

 





Continuous task retry support

2018-11-02 Thread Basil Hariri
Hi all,

I found that task retries are currently not 
supported
 in continuous processing mode. Is there another way to recover from continuous 
task failures currently? If not, are there plans to support this in a future 
release?
Thanks,
Basil