It's an interesting problem. What is the structure of the file? One big array? On hash with many key-value pairs?
Stephan On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hi Spark Users, > > I have a 50GB of JSON file, I would like to read and persist at HDFS so it > can be taken into next transformation. I am trying to read as > spark.read.json(path) but this is giving Out of memory error on driver. > Obviously, I can't afford having 50 GB on driver memory. In general, what > is the best practice to read large JSON file like 50 GB? > > Thanks > -- Stephan Wehner, Ph.D. The Buckmaster Institute, Inc. 2150 Adanac Street Vancouver BC V5L 2E7 Canada Cell (604) 767-7415 Fax (888) 808-4655 Sign up for our free email course http://buckmaster.ca/small_business_website_mistakes.html http://www.buckmaster.ca http://answer4img.com http://loggingit.com http://clocklist.com http://stephansmap.org http://benchology.com http://www.trafficlife.com http://stephan.sugarmotor.org (Personal Blog) @stephanwehner (Personal Account) VA7WSK (Personal call sign)