It seems to be because of this issues:
https://issues.apache.org/jira/browse/SPARK-10925
I added a checkpoint, as suggested, to break the lineage and it worked.
Best regards,
2017-07-04 17:26 GMT+02:00 Bernard Jesop :
> Thank Didac,
>
> My bad, actually this code is incomplete, it should have b
Thank Didac,
My bad, actually this code is incomplete, it should have been : - dfAgg =
df.groupBy("S_ID").agg(...).
I want to access the aggregated values (of dfAgg) for each row of 'df',
that is why I do a left outer join.
Also, regarding the second parameter, I am using this signature of join
With the left join, you are joining two tables.
In your case, df is the left table, dfAgg is the right table.
The second parameter should be the joining condition, right?
For instance
dfRes = df.join(dfAgg, $”userName”===$”name", "left_outer”)
having a field in df called userName, and another i