Hi all, apologies for the delayed response.

We are using spark version 3.4.1 in jar and EMR 6.11 runtime.

We have disabled the auto broadcast always and would broadcast the smaller
tables using explicit broadcast.

It was working fine historically and only now it is failing.

The data sizes I mentioned was taken from S3.

Thanks,
Sudharshan

On Wed, 17 Jul, 2024, 1:53 am Meena Rajani, <meenakraj...@gmail.com> wrote:

> Can you try disabling broadcast join and see what happens?
>
> On Mon, Jul 8, 2024 at 12:03 PM Sudharshan V <sudharshanv2...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Been facing a weird issue lately.
>> In our production code base , we have an explicit broadcast for a small
>> table.
>> It is just a look up table that is around 1gb in size in s3 and just had
>> few million records and 5 columns.
>>
>> The ETL was running fine , but with no change from the codebase nor the
>> infrastructure, we are getting broadcast failures. Even weird fact is the
>> older size of the data is 1.4gb while for the new run is just 900 MB
>>
>> Below is the error message
>> Cannot broadcast table that is larger than 8 GB : 8GB.
>>
>> I find it extremely weird considering that the data size is very well
>> under the thresholds.
>>
>> Are there any other ways to find what could be the issue and how we can
>> rectify this issue?
>>
>> Could the data characteristics be an issue?
>>
>> Any help would be immensely appreciated.
>>
>> Thanks
>>
>

Reply via email to