ot;, "8")
>
>
>
>
>
>
> On Fri, Jul 4, 2014 at 8:52 AM, Mayur Rustagi
wrote:
>>
>> I would go with Spark only if you are certain that you are going to
scale out in the near future.
>> You can change the default storage of RDD to DISK_ONLY, that migh
Hi all!
I have a folder with 150 G of txt files (around 700 files, on average each
200 MB).
I'm using scala to process the files and calculate some aggregate
statistics in the end. I see two possible approaches to do that: - manually
loop through all the files, do the calculations per file and me