Re: spark 2.0 readStream from a REST API

Sela, Amit Thu, 11 Aug 2016 00:40:04 -0700

The current available output modes are Complete and Append. Complete mode is 
for stateful processing (aggregations), and Append mode for stateless 
processing (I.e., map/filter). See : 
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes
Dataset#writeStream will produce a DataStreamWriter which allows you to start a 
query. This seems consistent with Spark’s previous behaviour of only executing 
upon an “action”, and the queries I guess are what “jobs” used to be.



Thanks,
Amit

From: Ayoub Benali 
<benali.ayoub.i...@gmail.com<mailto:benali.ayoub.i...@gmail.com>>
Date: Tuesday, August 2, 2016 at 11:59 AM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Cc: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>, Amit Sela 
<amitsel...@gmail.com<mailto:amitsel...@gmail.com>>, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>>
Subject: Re: spark 2.0 readStream from a REST API

Why writeStream is needed to consume the data ?

When I tried it I got this exception:

INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
org.apache.spark.sql.AnalysisException: Complete output mode not supported when 
there are no streaming aggregations on streaming DataFrames/Datasets;
at 
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
at 
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForStreaming(UnsupportedOperationChecker.scala:65)
at 
org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:236)
at 
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:287)
at .<init>(<console>:59)



2016-08-01 18:44 GMT+02:00 Amit Sela 
<amitsel...@gmail.com<mailto:amitsel...@gmail.com>>:
I think you're missing:

valquery=wordCounts.writeStream

  .outputMode("complete")
  .format("console")
  .start()

Dis it help ?

On Mon, Aug 1, 2016 at 2:44 PM Jacek Laskowski 
<ja...@japila.pl<mailto:ja...@japila.pl>> wrote:
On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
<benali.ayoub.i...@gmail.com<mailto:benali.ayoub.i...@gmail.com>> wrote:

> the problem now is that when I consume the dataframe for example with count
> I get the stack trace below.

Mind sharing the entire pipeline?

> I followed the implementation of TextSocketSourceProvider to implement my
> data source and Text Socket source is used in the official documentation
> here.

Right. Completely forgot about the provider. Thanks for reminding me about it!

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>

Re: spark 2.0 readStream from a REST API

Reply via email to