The primary role of a sink is storing output tuples. Consider groupByKey and
map/flatMapGroupsWithState instead.
-Chris
From: karthikjay
Sent: Friday, April 20, 2018 4:49:49 PM
To: user@spark.apache.org
Subject: [Structured Streaming] [Kafka] How to repartition t
Hello.
I am wondering, if there is any new update on Spark upgrade to Scala 2.12.
https://issues.apache.org/jira/browse/SPARK-14220. Especially given that
Scala 2.13 is near the vicinity of a release.
This is because, there is no recent update on the Jira and related ticket.
May be someone is act
Hi, I am trying to get the application id after I use SparkSubmit.main for a
yarn submission. I am able to make it asynchronous using
spark.yarn.watForCompletion=false configuration option, but I can't seem to
figure out how I can get the application id for this job. I read both
SparkSubmit.s
Any help appreciated. please find the question in the link:
https://stackoverflow.com/questions/49951022/spark-structured-streaming-with-kafka-how-to-repartition-the-data-and-distribu
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--
What's the right way of programmatically restarting a structured streaming
query which has terminated due to an exception? Example code or reference
would be appreciated.
Could it be done from within the onQueryTerminated() event handler of
StreamingQueryListener class?
Priyank
When reading a parquet created from a pandas dataframe with an unnamed index
spark creates a column named “__index_level_0__” since spark DataFrames do not
support row indexing. This looks like it is probably a bug to me, since as a
spark user I would expect unnamed index columns to be dropped o
I have the following code to read data from Kafka topic using the structured
streaming. The topic has 3 partitions:
val spark = SparkSession
.builder
.appName("TestPartition")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val dataFrame = spark
unsubscribe
--
Regards,
Varma Dantuluri
newb question...
say, memory per node is 16GB for 6 nodes (for a total of 96GB for the
cluster)
is 16GB the max amount of memory that can be allocated to driver? (since, it
is, after all, 16GB per node)
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-