Question of spark streaming

utkarsh rathor Fri, 27 Jul 2018 05:15:02 -0700

I am following the book *Spark the Definitive Guide* The following
code is *executed
locally using spark-shell*


Procedure: Started the spark-shell without any other options

val static = 
spark.read.json("/part-00079-tid-730451297822678341-1dda7027-2071-4d73-a0e2-7fb6a91e1d1f-0-c000.json")

val dataSchema = static.schema

val streaming = spark.readStream.schema(dataSchema)
.option("maxFilesPerTrigger",1).json("/part-00079-tid-730451297822678341-1dda7027-2071-4d73-a0e2-7fb6a91e1d1f-0-c000.json")

val activityCounts = streaming.groupBy("gt").count()

val activityQuery  =
activityCounts.writeStream.queryName("activity_counts").format("memory").outputMode("complete").start()

activityQuery.awaitTermination()

The Books says that *"After this code is executed the streaming computation
will have started in the background" .... "Now that this stream is running
, we can experiment with the result by querying*"

*MY OBSERVATION:*

When this code is executed it does not frees the shell for me to type in
the commands such as*spark.streams.active*

Hence I cannot query this stream

*My resarch*

I tried to open a new spark-shell but querying in that shell does not
returns any results. Are the streams obtained from this shell accessible
from other another instance of the shell.

I want the table in memory so that I can use the to query using command

for( i <- 1 to 5)
{
spark.sql("SELECT * FROM activity_counts").show()
Thread.sleep(1000)
}

Question of spark streaming

Reply via email to