I don't think this addresses my comment at all. Please try correctly
implementing equals and hashCode for your key class first.
On Tue, Dec 29, 2020 at 8:31 PM Shiao-An Yuan
wrote:
> Hi Sean,
>
> Sorry, I didn't describe it clearly. The column "pkey" is like a "Primary
> Key" and I do "reduce by
check it out
https://backbutton.co.uk/about.html
Regards
not Linda
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi Sean,
Sorry, I didn't describe it clearly. The column "pkey" is like a "Primary
Key" and I do "reduce by key" on this column, so the "amount of rows"
should always equal to the "cardinality of pkey".
When I said data get duplicated & lost, I mean duplicated "pkey" exists in
the output file (aft
Hello,
Request you to please help me out on the below queries:
I have 2 spark masters and 3 zookeepers deployed on my system on separate
virtual machines. The services come up online in the below sequence:
1. zookeeper-1
2. sparkmaster-1
3. sparkmaster-2
4. zookeeper-2
5. zookeepe
Total guess here, but your key is a case class. It does define hashCode and
equals for you, but, you have an array as one of the members. Array
equality is by reference, so, two arrays of the same elements are not
equal. You may have to define hashCode and equals manually to make them
correct.
On
Hi folks,
We recently identified a data correctness issue in our pipeline.
The data processing flow is as follows:
1. read the current snapshot (provide empty if it doesn't exist yet)
2. read unprocessed new data
3. union them and do a `reduceByKey` operation
4. output a new version of the snapsh