Understood of course.
Did the data fit comfortably in memory or did you experience memory
pressure? I've had to do a fair amount of tuning when under memory
pressure in the past (0.7.x) and was hoping that the handling of this
scenario is improved in later Spark versions.
On Thu, Mar 20, 2014 a
Reynold, just curious did you guys ran it in AWS?
- Henry
On Thu, Mar 20, 2014 at 11:08 AM, Reynold Xin wrote:
> Actually we just ran a job with 70TB+ compressed data on 28 worker nodes -
> I didn't count the size of the uncompressed data, but I am guessing it is
> somewhere between 200TB to 700
I'm not really at liberty to discuss details of the job. It involves some
expensive aggregated statistics, and took 10 hours to complete (mostly
bottlenecked by network & io).
On Thu, Mar 20, 2014 at 11:12 AM, Surendranauth Hiraman <
suren.hira...@velos.io> wrote:
> Reynold,
>
> How complex w
Reynold,
How complex was that job (I guess in terms of number of transforms and
actions) and how long did that take to process?
-Suren
On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin wrote:
> Actually we just ran a job with 70TB+ compressed data on 28 worker nodes -
> I didn't count the size of
Actually we just ran a job with 70TB+ compressed data on 28 worker nodes -
I didn't count the size of the uncompressed data, but I am guessing it is
somewhere between 200TB to 700TB.
On Thu, Mar 20, 2014 at 12:23 AM, Usman Ghani wrote:
> All,
> What is the largest input data set y'all have com
All,
What is the largest input data set y'all have come across that has been
successfully processed in production using spark. Ball park?