This seems like a suboptimal situation for a join. How can Spark know in
advance that all the fields are present and the tables have the same number
of rows? I suppose you could just sort the two frames by id and concatenate
them, but I'm not sure what join optimization is available here.
On Fri,
I have 2 dataframes , lets call them A and B,
A is made up out of [unique_id, field1]
B is made up out of [unique_id, field2]
The have the exact same number of rows, and every id in A is also present
in B
if I execute a join like this A.join(B,
Seq("unique_id")).select($"unique_id", $"field1") t