Have you tried to make partition by join's field and run it by segments, filtering both tables at the same segments of data?
Example: Val tableA = DfA.partitionby("joinField").filter("firstSegment") Val tableB= DfB.partitionby("joinField").filter("firstSegment") TableA.join(TableB).... El 2 may. 2017 8:30 p. m., "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> escribió: > Table 1 (192 GB) is partitioned by year and month ... 192 GB of data is > for one month i.e. for April > > Table 2: 92 GB not partitioned . > > I have to perform join on these tables now. > > > > On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta < > angel.francisco.o...@gmail.com> wrote: > >> Hello, >> >> Is the tables partitioned? >> If yes, what is the partition field? >> >> Thanks >> >> >> El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed" < >> mdkhajaasm...@gmail.com> escribió: >> >> Hi, >> >> I am trying to join two big tables in spark and the job is running for >> quite a long time without any results. >> >> Table 1: 192GB >> Table 2: 92 GB >> >> Does anyone have better solution to get the results fast? >> >> Thanks, >> Asmath >> >> >> >