date:20180420

Re: [Structured Streaming] [Kafka] How to repartition the data and distribute the processing among worker nodes

2018-04-20 Thread Bowden, Chris

The primary role of a sink is storing output tuples. Consider groupByKey and map/flatMapGroupsWithState instead. -Chris From: karthikjay Sent: Friday, April 20, 2018 4:49:49 PM To: user@spark.apache.org Subject: [Structured Streaming] [Kafka] How to repartition t

Spark with Scala 2.12

2018-04-20 Thread Jatin Puri

Hello. I am wondering, if there is any new update on Spark upgrade to Scala 2.12. https://issues.apache.org/jira/browse/SPARK-14220. Especially given that Scala 2.13 is near the vicinity of a release. This is because, there is no recent update on the Jira and related ticket. May be someone is act

Get application id when using SparkSubmit.main from java

2018-04-20 Thread Ron Gonzalez

Hi, I am trying to get the application id after I use SparkSubmit.main for a yarn submission. I am able to make it asynchronous using spark.yarn.watForCompletion=false configuration option, but I can't seem to figure out how I can get the application id for this job. I read both SparkSubmit.s

[Structured Streaming] [Kafka] How to repartition the data and distribute the processing among worker nodes

2018-04-20 Thread karthikjay

Any help appreciated. please find the question in the link: https://stackoverflow.com/questions/49951022/spark-structured-streaming-with-kafka-how-to-repartition-the-data-and-distribu -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --

[Structured Streaming] Restarting streaming query on exception/termination

2018-04-20 Thread Priyank Shrivastava

What's the right way of programmatically restarting a structured streaming query which has terminated due to an exception? Example code or reference would be appreciated. Could it be done from within the onQueryTerminated() event handler of StreamingQueryListener class? Priyank

Spark read parquet with unnamed index

2018-04-20 Thread Lord, Jesse

When reading a parquet created from a pandas dataframe with an unnamed index spark creates a column named “__index_level_0__” since spark DataFrames do not support row indexing. This looks like it is probably a bug to me, since as a spark user I would expect unnamed index columns to be dropped o

[Structured Streaming][Kafka] For a Kafka topic with 3 partitions, how does the parallelism work ?

2018-04-20 Thread karthikjay

I have the following code to read data from Kafka topic using the structured streaming. The topic has 3 partitions: val spark = SparkSession .builder .appName("TestPartition") .master("local[*]") .getOrCreate() import spark.implicits._ val dataFrame = spark

unsubscribe

2018-04-20 Thread varma dantuluri

unsubscribe -- Regards, Varma Dantuluri

--driver-memory allocation question

2018-04-20 Thread klrmowse

newb question... say, memory per node is 16GB for 6 nodes (for a total of 96GB for the cluster) is 16GB the max amount of memory that can be allocated to driver? (since, it is, after all, 16GB per node) Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

Re: [Structured Streaming] [Kafka] How to repartition the data and distribute the processing among worker nodes

Spark with Scala 2.12

Get application id when using SparkSubmit.main from java

[Structured Streaming] [Kafka] How to repartition the data and distribute the processing among worker nodes

[Structured Streaming] Restarting streaming query on exception/termination

Spark read parquet with unnamed index

[Structured Streaming][Kafka] For a Kafka topic with 3 partitions, how does the parallelism work ?

unsubscribe

--driver-memory allocation question

9 matches

Site Navigation

Mail list logo

Footer information