Thanks Micheal it really was a great demo I figured I needed to add a trigger to display the results. But Buraz from Databricks mentioned here <https://forums.databricks.com/questions/10925/structured-streaming-in-real-time.html#comment-10929> that the display on this functionality wont be available till potentially the next release of databricks 2.1-db3
Ill take your points into account and try and duplicate it Apologies if this isn't the forum for the question, im happy to take the question offline but I genuinely believe the mailing list users might find it very interesting Happy to take the discussion offline though :) On Thu, Feb 16, 2017 at 8:30 PM, Michael Armbrust <mich...@databricks.com> wrote: > Thanks for your interest in Apache Spark Structured Streaming! > > There is nothing secret in that demo, though I did make some configuration > changes in order to get the timing right (gotta have some dramatic effect > :) ). Also I think the visualizations based on metrics output by the > StreamingQueryListener > <https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html> > are > still being rolled out, but should be available everywhere soon. > > First, I set two options to make sure that files were read one at a time, > thus allowing us to see incremental results. > > spark.readStream > .option("maxFilesPerTrigger", "1") > .option("latestFirst", "true") > ... > > There is more detail on how these options work in this post > <https://databricks.com/blog/2017/01/19/real-time-streaming-etl-structured-streaming-apache-spark-2-1.html> > . > > Regarding continually updating result of a streaming query using > display(df)for streaming DataFrames (i.e. one created with > spark.readStream), that has worked in Databrick's since Spark 2.1. The > longer form example we published requires you to rerun the count to see it > change at the end of the notebook because that is not a streaming query. > Instead it is a batch query over data that has been written out by another > stream. I'd like to add the ability to run a streaming query from data > that has been written out by the FileSink (tracked here SPARK-19633 > <https://issues.apache.org/jira/browse/SPARK-19633>). > > In the demo, I started two different streaming queries: > - one that reads from json / kafka => writes to parquet > - one that reads from json / kafka => writes to memory sink > <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks> > / pushes latest answer to the js running in a browser using the > StreamingQueryListener > <https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html>. > This is packaged up nicely in display(), but there is nothing stopping > you from building something similar with vanilla Apache Spark. > > Michael > > > On Wed, Feb 15, 2017 at 11:34 AM, Sam Elamin <hussam.ela...@gmail.com> > wrote: > >> Hey folks >> >> This one is mainly aimed at the databricks folks, I have been trying to >> replicate the cloudtrail demo >> <https://www.youtube.com/watch?v=IJmFTXvUZgY> Micheal did at Spark >> Summit. The code for it can be found here >> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/2070341989008532/3601578643761083/latest.html> >> >> My question is how did you get the results to be displayed and updated >> continusly in real time >> >> I am also using databricks to duplicate it but I noticed the code link >> mentions >> >> "If you count the number of rows in the table, you should find the >> value increasing over time. Run the following every few minutes." >> This leads me to believe that the version of Databricks that Micheal was >> using for the demo is still not released, or at-least the functionality to >> display those changes in real time aren't >> >> Is this the case? or am I completely wrong? >> >> Can I display the results of a structured streaming query in realtime >> using the databricks "display" function? >> >> >> Regards >> Sam >> > >