Hi, What is the size of one json document ?
There is also the scan of your json to define the schema, the overhead can be huge. 2 solution: define a schema and use directly during the load or ask spark to analyse a small part of the json file (I don't remember how to do it) Regards, On Thu, Jun 18, 2020 at 3:12 PM Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hi Spark Users, > > I have a 50GB of JSON file, I would like to read and persist at HDFS so it > can be taken into next transformation. I am trying to read as > spark.read.json(path) but this is giving Out of memory error on driver. > Obviously, I can't afford having 50 GB on driver memory. In general, what > is the best practice to read large JSON file like 50 GB? > > Thanks > -- M'BAREK Med Nihed, Fedora Ambassador, TUNISIA, Northern Africa http://www.nihed.com <http://tn.linkedin.com/in/nihed>