Hello,
I have answered it on the Stack Overflow.
Best Regards,
Attila
On Wed, May 12, 2021 at 4:57 PM Chris Thomas
wrote:
> Hi,
>
> I am pretty confident I have observed Spark configured with the Shuffle
> Service continuing to fetch shuffle files on a node in the event of
> executor failure,
Hi,
In the case where the left and right hand side share a common parent like:
df = spark.read.someDataframe().withColumn('rownum', row_number())
df1 = df.withColumn('c1', expensive_udf1('foo')).select('c1', 'rownum')
df2 = df.withColumn('c2', expensive_udf2('bar')).select('c2', 'rownum')
df_join
Have you managed to sort out this problem and the reason this solution is
not working!
Bottom line, your temperature data comes in streams every two seconds and
you want an average of temperature for the past 300 seconds worth of data,
in other words your windows length is 300 seconds?
You also w
Yeah I don't think that's going to work - you aren't guaranteed to get 1,
2, 3, etc. I think row_number() might be what you need to generate a join
ID.
RDD has a .zip method, but (unless I'm forgetting!) DataFrame does not. You
could .zip two RDDs you get from DataFrames and manually convert the R
Thanks Raghvendra
Will the ids for corresponding columns be same always ? Since
monotonic_increasing_id() returns a number based on partitionId and the row
number of the partition ,will it be same for corresponding columns? Also
is it guaranteed that the two dataframes will be divided into logic
You can add an extra id column and perform an inner join.
val df1_with_id = df1.withColumn("id", monotonically_increasing_id())
val df2_with_id = df2.withColumn("id", monotonically_increasing_id())
df1_with_id.join(df2_with_id, Seq("id"), "inner").drop("id").show()
+-+-+
|amoun
Hi,
I am pretty confident I have observed Spark configured with the Shuffle Service
continuing to fetch shuffle files on a node in the event of executor failure,
rather than recompute the shuffle files as happens without the Shuffle Service.
Can anyone confirm this?
(I have a SO questionÂ
Hi All,
I have two dataframes
df1
amount_6m
100
200
300
400
500
And a second data df2 below
amount_9m
500
600
700
800
900
The number of rows is same in both dataframes.
Can I merge the two dataframes to achieve below df
df3
amount_6m | amount_9m
100 50