Re: Reading TB of JSON file

Chetan Khatri Thu, 18 Jun 2020 06:35:06 -0700

It is dynamically generated and written at s3 bucket not historical data so
I guess it doesn't have jsonlines format


On Thu, Jun 18, 2020 at 9:16 AM Jörn Franke <jornfra...@gmail.com> wrote:

> Depends on the data types you use.
>
> Do you have in jsonlines format? Then the amount of memory plays much less
> a role.
>
> Otherwise if it is one large object or array I would not recommend it.
>
> > Am 18.06.2020 um 15:12 schrieb Chetan Khatri <
> chetan.opensou...@gmail.com>:
> >
> > 
> > Hi Spark Users,
> >
> > I have a 50GB of JSON file, I would like to read and persist at HDFS so
> it can be taken into next transformation. I am trying to read as
> spark.read.json(path) but this is giving Out of memory error on driver.
> Obviously, I can't afford having 50 GB on driver memory. In general, what
> is the best practice to read large JSON file like 50 GB?
> >
> > Thanks
>

Re: Reading TB of JSON file

Reply via email to