Hi all, Do we have any idea on this. Thanks
On Tue, 23 Jul, 2024, 12:54 pm Sudharshan V, <sudharshanv2...@gmail.com> wrote: > We removed the explicit broadcast for that particular table and it took > longer time since the join type changed from BHJ to SMJ. > > I wanted to understand how I can find what went wrong with the broadcast > now. > How do I know the size of the table inside of spark memory. > > I have tried to cache the table hoping I could see the table size in the > storage tab of spark UI of EMR. > > But I see no data there . > > Thanks > > On Tue, 23 Jul, 2024, 12:48 pm Sudharshan V, <sudharshanv2...@gmail.com> > wrote: > >> Hi all, apologies for the delayed response. >> >> We are using spark version 3.4.1 in jar and EMR 6.11 runtime. >> >> We have disabled the auto broadcast always and would broadcast the >> smaller tables using explicit broadcast. >> >> It was working fine historically and only now it is failing. >> >> The data sizes I mentioned was taken from S3. >> >> Thanks, >> Sudharshan >> >> On Wed, 17 Jul, 2024, 1:53 am Meena Rajani, <meenakraj...@gmail.com> >> wrote: >> >>> Can you try disabling broadcast join and see what happens? >>> >>> On Mon, Jul 8, 2024 at 12:03 PM Sudharshan V <sudharshanv2...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> Been facing a weird issue lately. >>>> In our production code base , we have an explicit broadcast for a small >>>> table. >>>> It is just a look up table that is around 1gb in size in s3 and just >>>> had few million records and 5 columns. >>>> >>>> The ETL was running fine , but with no change from the codebase nor the >>>> infrastructure, we are getting broadcast failures. Even weird fact is the >>>> older size of the data is 1.4gb while for the new run is just 900 MB >>>> >>>> Below is the error message >>>> Cannot broadcast table that is larger than 8 GB : 8GB. >>>> >>>> I find it extremely weird considering that the data size is very well >>>> under the thresholds. >>>> >>>> Are there any other ways to find what could be the issue and how we can >>>> rectify this issue? >>>> >>>> Could the data characteristics be an issue? >>>> >>>> Any help would be immensely appreciated. >>>> >>>> Thanks >>>> >>>