date:20201229

Re: Correctness bug on Shuffle+Repartition scenario

2020-12-29 Thread Sean Owen

I don't think this addresses my comment at all. Please try correctly implementing equals and hashCode for your key class first. On Tue, Dec 29, 2020 at 8:31 PM Shiao-An Yuan wrote: > Hi Sean, > > Sorry, I didn't describe it clearly. The column "pkey" is like a "Primary > Key" and I do "reduce by

About

2020-12-29 Thread LInda hackkanan

check it out https://backbutton.co.uk/about.html Regards not Linda - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Correctness bug on Shuffle+Repartition scenario

2020-12-29 Thread Shiao-An Yuan

Hi Sean, Sorry, I didn't describe it clearly. The column "pkey" is like a "Primary Key" and I do "reduce by key" on this column, so the "amount of rows" should always equal to the "cardinality of pkey". When I said data get duplicated & lost, I mean duplicated "pkey" exists in the output file (aft

[Spark Streaming] Why is ZooKeeper LeaderElection Agent not being called by Spark Master?

2020-12-29 Thread Saloni Mehta

Hello, Request you to please help me out on the below queries: I have 2 spark masters and 3 zookeepers deployed on my system on separate virtual machines. The services come up online in the below sequence: 1. zookeeper-1 2. sparkmaster-1 3. sparkmaster-2 4. zookeeper-2 5. zookeepe

Re: Correctness bug on Shuffle+Repartition scenario

2020-12-29 Thread Sean Owen

Total guess here, but your key is a case class. It does define hashCode and equals for you, but, you have an array as one of the members. Array equality is by reference, so, two arrays of the same elements are not equal. You may have to define hashCode and equals manually to make them correct. On

Correctness bug on Shuffle+Repartition scenario

2020-12-29 Thread Shiao-An Yuan

Hi folks, We recently identified a data correctness issue in our pipeline. The data processing flow is as follows: 1. read the current snapshot (provide empty if it doesn't exist yet) 2. read unprocessed new data 3. union them and do a `reduceByKey` operation 4. output a new version of the snapsh

Re: Correctness bug on Shuffle+Repartition scenario

About

Re: Correctness bug on Shuffle+Repartition scenario

[Spark Streaming] Why is ZooKeeper LeaderElection Agent not being called by Spark Master?

Re: Correctness bug on Shuffle+Repartition scenario

Correctness bug on Shuffle+Repartition scenario

6 matches

Site Navigation

Mail list logo

Footer information