Re: Joins in Spark

Angel Francisco Orta Tue, 02 May 2017 13:00:06 -0700

Sorry, I had a typo I mean repartitionby("fieldofjoin)

El 2 may. 2017 9:44 p. m., "KhajaAsmath Mohammed" <[email protected]>
escribió:


Hi Angel,

I am trying using the below code but i dont see partition on the dataframe.

      val iftaGPSLocation_df = sqlContext.sql(iftaGPSLocQry)
      import sqlContext._
      import sqlContext.implicits._
      datapoint_prq_df.join(geoCacheLoc_df)

Val tableA = DfA.partitionby("joinField").filter("firstSegment")

Columns I have are Lat3,Lon3, VIN, Time  . Lat3 and Lon3 are my join
columns on both dataframes and rest are select columns

Thanks,
Asmath



On Tue, May 2, 2017 at 1:38 PM, Angel Francisco Orta <
[email protected]> wrote:

> Have you tried to make partition by join's field and run it by segments,
> filtering both tables at the same segments of data?
>
> Example:
>
> Val tableA = DfA.partitionby("joinField").filter("firstSegment")
> Val tableB= DfB.partitionby("joinField").filter("firstSegment")
>
> TableA.join(TableB)....
>
> El 2 may. 2017 8:30 p. m., "KhajaAsmath Mohammed" <[email protected]>
> escribió:
>
>> Table 1 (192 GB) is partitioned by year and month ... 192 GB of data is
>> for one month i.e. for April
>>
>> Table 2: 92 GB not partitioned .
>>
>> I have to perform join on  these tables now.
>>
>>
>>
>> On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>> Is the tables partitioned?
>>> If yes, what is the partition field?
>>>
>>> Thanks
>>>
>>>
>>> El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed" <
>>> [email protected]> escribió:
>>>
>>> Hi,
>>>
>>> I am trying to join two big tables in spark and the job is running for
>>> quite a long time without any results.
>>>
>>> Table 1: 192GB
>>> Table 2: 92 GB
>>>
>>> Does anyone have better solution to get the results fast?
>>>
>>> Thanks,
>>> Asmath
>>>
>>>
>>>
>>

Re: Joins in Spark

Reply via email to