Hi All, I am getting Out Of Memory due to GC overhead while reading a table from HIVE from spark like:
spark.sql("SELECT * FROM some.table where date='2019-05-14' LIMIT > 10").show() So when I run above command in spark-shell then it starts processing *1780 tasks* where it goes OOM at a specific partition. 1. Table partition(*date='2019-05-14'*) is having *4000* files on HDFS so ideally 4000 partitions should be created inside Spark Dataframe if I am not wrong. I analyzed the table actually it is having total *1780* partitions(means 1780 dates folder). 2. I checked the size of files in Table partition(*date='2019-05-14'*), max file size is *1.1 GB* and I have given *7GB* to each executor so if I am right above then it should not throw OOM. 3. And when I have put the* LIMIT 10* then does spark-hive reads all files? Thanks -- Shivam Sharma Indian Institute Of Information Technology, Design and Manufacturing Jabalpur Email:- 28shivamsha...@gmail.com LinkedIn:-*https://www.linkedin.com/in/28shivamsharma <https://www.linkedin.com/in/28shivamsharma>*