Hi all,
Do we have any idea on this.

Thanks

On Tue, 23 Jul, 2024, 12:54 pm Sudharshan V, <sudharshanv2...@gmail.com>
wrote:

> We removed the explicit broadcast for that particular table and it took
> longer time since the join type changed from BHJ to SMJ.
>
> I wanted to understand how I can find what went wrong with the broadcast
> now.
> How do I know the size of the table inside of spark memory.
>
> I have tried to cache the table hoping I could see the table size in the
> storage tab of spark UI of EMR.
>
> But I see no data there .
>
> Thanks
>
> On Tue, 23 Jul, 2024, 12:48 pm Sudharshan V, <sudharshanv2...@gmail.com>
> wrote:
>
>> Hi all, apologies for the delayed response.
>>
>> We are using spark version 3.4.1 in jar and EMR 6.11 runtime.
>>
>> We have disabled the auto broadcast always and would broadcast the
>> smaller tables using explicit broadcast.
>>
>> It was working fine historically and only now it is failing.
>>
>> The data sizes I mentioned was taken from S3.
>>
>> Thanks,
>> Sudharshan
>>
>> On Wed, 17 Jul, 2024, 1:53 am Meena Rajani, <meenakraj...@gmail.com>
>> wrote:
>>
>>> Can you try disabling broadcast join and see what happens?
>>>
>>> On Mon, Jul 8, 2024 at 12:03 PM Sudharshan V <sudharshanv2...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Been facing a weird issue lately.
>>>> In our production code base , we have an explicit broadcast for a small
>>>> table.
>>>> It is just a look up table that is around 1gb in size in s3 and just
>>>> had few million records and 5 columns.
>>>>
>>>> The ETL was running fine , but with no change from the codebase nor the
>>>> infrastructure, we are getting broadcast failures. Even weird fact is the
>>>> older size of the data is 1.4gb while for the new run is just 900 MB
>>>>
>>>> Below is the error message
>>>> Cannot broadcast table that is larger than 8 GB : 8GB.
>>>>
>>>> I find it extremely weird considering that the data size is very well
>>>> under the thresholds.
>>>>
>>>> Are there any other ways to find what could be the issue and how we can
>>>> rectify this issue?
>>>>
>>>> Could the data characteristics be an issue?
>>>>
>>>> Any help would be immensely appreciated.
>>>>
>>>> Thanks
>>>>
>>>

Reply via email to