Re: Structured Streaming Spark Summit Demo - Databricks people

Sam Elamin Thu, 16 Feb 2017 12:48:07 -0800

Thanks Micheal it really was a great demo

I figured I needed to add a trigger to display the results. But Buraz from
Databricks mentioned here
<https://forums.databricks.com/questions/10925/structured-streaming-in-real-time.html#comment-10929>
that the display on this functionality wont be available till potentially
the next release of databricks 2.1-db3


Ill take your points into account and try and duplicate it

Apologies if this isn't the forum for the question, im happy to take the
question offline but I genuinely believe the mailing list users might find
it very interesting

Happy to take the discussion offline though :)



On Thu, Feb 16, 2017 at 8:30 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> Thanks for your interest in Apache Spark Structured Streaming!
>
> There is nothing secret in that demo, though I did make some configuration
> changes in order to get the timing right (gotta have some dramatic effect
> :) ).  Also I think the visualizations based on metrics output by the
> StreamingQueryListener
> <https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html>
>  are
> still being rolled out, but should be available everywhere soon.
>
> First, I set two options to make sure that files were read one at a time,
> thus allowing us to see incremental results.
>
> spark.readStream
>   .option("maxFilesPerTrigger", "1")
>   .option("latestFirst", "true")
> ...
>
> There is more detail on how these options work in this post
> <https://databricks.com/blog/2017/01/19/real-time-streaming-etl-structured-streaming-apache-spark-2-1.html>
> .
>
> Regarding continually updating result of a streaming query using
> display(df)for streaming DataFrames (i.e. one created with
> spark.readStream), that has worked in Databrick's since Spark 2.1.  The
> longer form example we published requires you to rerun the count to see it
> change at the end of the notebook because that is not a streaming query.
> Instead it is a batch query over data that has been written out by another
> stream.  I'd like to add the ability to run a streaming query from data
> that has been written out by the FileSink (tracked here SPARK-19633
> <https://issues.apache.org/jira/browse/SPARK-19633>).
>
> In the demo, I started two different streaming queries:
>  - one that reads from json / kafka => writes to parquet
>  - one that reads from json / kafka => writes to memory sink
> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks>
> / pushes latest answer to the js running in a browser using the
> StreamingQueryListener
> <https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html>.
> This is packaged up nicely in display(), but there is nothing stopping
> you from building something similar with vanilla Apache Spark.
>
> Michael
>
>
> On Wed, Feb 15, 2017 at 11:34 AM, Sam Elamin <hussam.ela...@gmail.com>
> wrote:
>
>> Hey folks
>>
>> This one is mainly aimed at the databricks folks, I have been trying to
>> replicate the cloudtrail demo
>> <https://www.youtube.com/watch?v=IJmFTXvUZgY> Micheal did at Spark
>> Summit. The code for it can be found here
>> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/2070341989008532/3601578643761083/latest.html>
>>
>> My question is how did you get the results to be displayed and updated
>> continusly in real time
>>
>> I am also using databricks to duplicate it but I noticed the code link
>> mentions
>>
>>  "If you count the number of rows in the table, you should find the
>> value increasing over time. Run the following every few minutes."
>> This leads me to believe that the version of Databricks that Micheal was
>> using for the demo is still not released, or at-least the functionality to
>> display those changes in real time aren't
>>
>> Is this the case? or am I completely wrong?
>>
>> Can I display the results of a structured streaming query in realtime
>> using the databricks "display" function?
>>
>>
>> Regards
>> Sam
>>
>
>

Re: Structured Streaming Spark Summit Demo - Databricks people

Reply via email to