subject:"Any way to make catalyst optimise away join"

Re: Any way to make catalyst optimise away join

2019-11-29 Thread Jerry Vinokurov

This seems like a suboptimal situation for a join. How can Spark know in advance that all the fields are present and the tables have the same number of rows? I suppose you could just sort the two frames by id and concatenate them, but I'm not sure what join optimization is available here. On Fri,

Any way to make catalyst optimise away join

2019-11-29 Thread jelmer

I have 2 dataframes , lets call them A and B, A is made up out of [unique_id, field1] B is made up out of [unique_id, field2] The have the exact same number of rows, and every id in A is also present in B if I execute a join like this A.join(B, Seq("unique_id")).select($"unique_id", $"field1") t