Re: SQL FIlter of tweets (json) running on Disk

Abel Coronado Iruegas Fri, 04 Jul 2014 07:57:33 -0700

Ok i find this slides of Yin Huai (
http://spark-summit.org/wp-content/uploads/2014/07/Easy-json-Data-Manipulation-Yin-Huai.pdf
)

to read a Json file the code seem pretty simple :

sqlContext.jsonFile("data.json")  <---- Is this already available in the
master branch???

But the question about the use a combination of resources (Memory
processing & Disk processing) still remains.

Thanks !!

On Fri, Jul 4, 2014 at 9:49 AM, Abel Coronado Iruegas <
acoronadoirue...@gmail.com> wrote:

> Hi everybody
>
> Someone can tell me if it is possible to read and filter a 60 GB file of
> tweets (Json Docs) in a Standalone Spark Deployment that runs in a single
> machine with 40 Gb RAM and 8 cores???
>
> I mean, is it possible to configure Spark to work with some amount of
> memory (20 GB) and the rest of the process in Disk, and avoid OutOfMemory
> exceptions????
>
> Regards
>
> Abel
>

Re: SQL FIlter of tweets (json) running on Disk

Reply via email to