date:20190205

Spark 2.x duplicates output when task fails at "repartition" stage. Checkpointing is enabled before repartition.

2019-02-05 Thread Serega Sheypak

Hi, I have spark job that produces duplicates when one or tasks from repartition stage fails. Here is simplified code. sparkContext.setCheckpointDir("hdfs://path-to-checkpoint-dir") *val *inputRDDs: List[RDD[String]] = *List*.*empty *// an RDD per input dir *val *updatedRDDs = inputRDDs.map{ in

Re: DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-05 Thread Ryan Blue

Shubham, DataSourceV2 passes Spark's internal representation to your source and expects Spark's internal representation back from the source. That's why you consume and produce InternalRow: "internal" indicates that Spark doesn't need to convert the values. Spark's internal representation for a d

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger

To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/c

Re: Back pressure not working on streaming

2019-02-05 Thread Cody Koeninger

That article is pretty old, If you click through the link to the jira mentioned in it, https://issues.apache.org/jira/browse/SPARK-18580 , it's been resolved. On Wed, Jan 2, 2019 at 12:42 AM JF Chen wrote: > > yes, 10 is a very low value for testing initial rate. > And from this article > https:

DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-05 Thread Shubham Chaurasia

Hi All, I am using custom DataSourceV2 implementation (*Spark version 2.3.2*) Here is how I am trying to pass in *date type *from spark shell. scala> val df = > sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype", > col("datetype").cast("date")) > scala> df.write.format("com

Spark 2.x duplicates output when task fails at "repartition" stage. Checkpointing is enabled before repartition.

Re: DataSourceV2 producing wrong date value in Custom Data Writer

Re: Structured streaming from Kafka by timestamp

Re: Back pressure not working on streaming

DataSourceV2 producing wrong date value in Custom Data Writer

5 matches

Site Navigation

Mail list logo

Footer information