Re: Spark memory optimization

2014-07-07 Thread Igor Pernek
ot;, "8") > > > > > > > On Fri, Jul 4, 2014 at 8:52 AM, Mayur Rustagi wrote: >> >> I would go with Spark only if you are certain that you are going to scale out in the near future. >> You can change the default storage of RDD to DISK_ONLY, that migh

Spark memory optimization

2014-07-04 Thread Igor Pernek
Hi all! I have a folder with 150 G of txt files (around 700 files, on average each 200 MB). I'm using scala to process the files and calculate some aggregate statistics in the end. I see two possible approaches to do that: - manually loop through all the files, do the calculations per file and me