It will help if you mention the Spark version and the piece of problematic code
HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College London <https://en.wikipedia.org/wiki/Imperial_College_London> London, United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Tue, 16 Jul 2024 at 08:51, Sudharshan V <sudharshanv2...@gmail.com> wrote: > > On Mon, 8 Jul, 2024, 7:53 pm Sudharshan V, <sudharshanv2...@gmail.com> > wrote: > >> Hi all, >> >> Been facing a weird issue lately. >> In our production code base , we have an explicit broadcast for a small >> table. >> It is just a look up table that is around 1gb in size in s3 and just had >> few million records and 5 columns. >> >> The ETL was running fine , but with no change from the codebase nor the >> infrastructure, we are getting broadcast failures. Even weird fact is the >> older size of the data is 1.4gb while for the new run is just 900 MB >> >> Below is the error message >> Cannot broadcast table that is larger than 8 GB : 8GB. >> >> I find it extremely weird considering that the data size is very well >> under the thresholds. >> >> Are there any other ways to find what could be the issue and how we can >> rectify this issue? >> >> Could the data characteristics be an issue? >> >> Any help would be immensely appreciated. >> >> Thanks >> >