@spark.apache.org
Subject: Re: Scalability of group by
Hi,
I can offer a few ideas to investigate in regards to your issue here. I've run
into resource issues doing shuffle operations with a much smaller dataset than
2B. The data is going to be saved to disk by the BlockManager as part o
gt;
>
> *From:* ayan guha [mailto:guha.a...@gmail.com]
> *Sent:* Monday, April 27, 2015 6:58 PM
> *To:* Ulanov, Alexander
> *Cc:* user@spark.apache.org
> *Subject:* Re: Scalability of group by
>
>
>
> Hi
>
> Can you test on a smaller dataset to identify if it is cluster issue
@spark.apache.org
Subject: Re: Scalability of group by
Hi
Can you test on a smaller dataset to identify if it is cluster issue or scaling
issue in spark
On 28 Apr 2015 11:30, "Ulanov, Alexander"
mailto:alexander.ula...@hp.com>> wrote:
Hi,
I am running a group by on a dataset of 2
Hi
Can you test on a smaller dataset to identify if it is cluster issue or
scaling issue in spark
On 28 Apr 2015 11:30, "Ulanov, Alexander" wrote:
> Hi,
>
>
>
> I am running a group by on a dataset of 2B of RDD[Row [id, time, value]]
> in Spark 1.3 as follows:
>
> “select id, time, first(value)
Hi,
I am running a group by on a dataset of 2B of RDD[Row [id, time, value]] in
Spark 1.3 as follows:
"select id, time, first(value) from data group by id, time"
My cluster is 8 nodes with 16GB RAM and one worker per node. Each executor is
allocated with 5GB of memory. However, all executors ar