Hi Angel, I am trying using the below code but i dont see partition on the dataframe.
val iftaGPSLocation_df = sqlContext.sql(iftaGPSLocQry) import sqlContext._ import sqlContext.implicits._ datapoint_prq_df.join(geoCacheLoc_df) Val tableA = DfA.partitionby("joinField").filter("firstSegment") Columns I have are Lat3,Lon3, VIN, Time . Lat3 and Lon3 are my join columns on both dataframes and rest are select columns Thanks, Asmath On Tue, May 2, 2017 at 1:38 PM, Angel Francisco Orta < angel.francisco.o...@gmail.com> wrote: > Have you tried to make partition by join's field and run it by segments, > filtering both tables at the same segments of data? > > Example: > > Val tableA = DfA.partitionby("joinField").filter("firstSegment") > Val tableB= DfB.partitionby("joinField").filter("firstSegment") > > TableA.join(TableB).... > > El 2 may. 2017 8:30 p. m., "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> > escribió: > >> Table 1 (192 GB) is partitioned by year and month ... 192 GB of data is >> for one month i.e. for April >> >> Table 2: 92 GB not partitioned . >> >> I have to perform join on these tables now. >> >> >> >> On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta < >> angel.francisco.o...@gmail.com> wrote: >> >>> Hello, >>> >>> Is the tables partitioned? >>> If yes, what is the partition field? >>> >>> Thanks >>> >>> >>> El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed" < >>> mdkhajaasm...@gmail.com> escribió: >>> >>> Hi, >>> >>> I am trying to join two big tables in spark and the job is running for >>> quite a long time without any results. >>> >>> Table 1: 192GB >>> Table 2: 92 GB >>> >>> Does anyone have better solution to get the results fast? >>> >>> Thanks, >>> Asmath >>> >>> >>> >>