Sorry, I had a typo I mean repartitionby("fieldofjoin)
El 2 may. 2017 9:44 p. m., "KhajaAsmath Mohammed" <[email protected]>
escribió:
Hi Angel,
I am trying using the below code but i dont see partition on the dataframe.
val iftaGPSLocation_df = sqlContext.sql(iftaGPSLocQry)
import sqlContext._
import sqlContext.implicits._
datapoint_prq_df.join(geoCacheLoc_df)
Val tableA = DfA.partitionby("joinField").filter("firstSegment")
Columns I have are Lat3,Lon3, VIN, Time . Lat3 and Lon3 are my join
columns on both dataframes and rest are select columns
Thanks,
Asmath
On Tue, May 2, 2017 at 1:38 PM, Angel Francisco Orta <
[email protected]> wrote:
> Have you tried to make partition by join's field and run it by segments,
> filtering both tables at the same segments of data?
>
> Example:
>
> Val tableA = DfA.partitionby("joinField").filter("firstSegment")
> Val tableB= DfB.partitionby("joinField").filter("firstSegment")
>
> TableA.join(TableB)....
>
> El 2 may. 2017 8:30 p. m., "KhajaAsmath Mohammed" <[email protected]>
> escribió:
>
>> Table 1 (192 GB) is partitioned by year and month ... 192 GB of data is
>> for one month i.e. for April
>>
>> Table 2: 92 GB not partitioned .
>>
>> I have to perform join on these tables now.
>>
>>
>>
>> On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>> Is the tables partitioned?
>>> If yes, what is the partition field?
>>>
>>> Thanks
>>>
>>>
>>> El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed" <
>>> [email protected]> escribió:
>>>
>>> Hi,
>>>
>>> I am trying to join two big tables in spark and the job is running for
>>> quite a long time without any results.
>>>
>>> Table 1: 192GB
>>> Table 2: 92 GB
>>>
>>> Does anyone have better solution to get the results fast?
>>>
>>> Thanks,
>>> Asmath
>>>
>>>
>>>
>>