Re: Joins in Spark

Angel Francisco Orta Tue, 02 May 2017 11:38:23 -0700

Have you tried to make partition by join's field and run it by segments,
filtering both tables at the same segments of data?


Example:

Val tableA = DfA.partitionby("joinField").filter("firstSegment")
Val tableB= DfB.partitionby("joinField").filter("firstSegment")

TableA.join(TableB)....

El 2 may. 2017 8:30 p. m., "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com>
escribió:

> Table 1 (192 GB) is partitioned by year and month ... 192 GB of data is
> for one month i.e. for April
>
> Table 2: 92 GB not partitioned .
>
> I have to perform join on  these tables now.
>
>
>
> On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta <
> angel.francisco.o...@gmail.com> wrote:
>
>> Hello,
>>
>> Is the tables partitioned?
>> If yes, what is the partition field?
>>
>> Thanks
>>
>>
>> El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed" <
>> mdkhajaasm...@gmail.com> escribió:
>>
>> Hi,
>>
>> I am trying to join two big tables in spark and the job is running for
>> quite a long time without any results.
>>
>> Table 1: 192GB
>> Table 2: 92 GB
>>
>> Does anyone have better solution to get the results fast?
>>
>> Thanks,
>> Asmath
>>
>>
>>
>

Re: Joins in Spark

Reply via email to