Hi,

Have you see a slide in spark summit 2016?
https://spark-summit.org/2016/events/top-5-mistakes-when-writing-spark-applications/
This is a good slide for your capacity planning.

// maropu

On Tue, Jul 12, 2016 at 2:31 PM, Yash Sharma <yash...@gmail.com> wrote:

> I would say use the dynamic allocation rather than number of executors.
> Provide some executor memory which you would like.
> Deciding the values requires couple of test runs and checking what works
> best for you.
>
> You could try something like -
>
> --driver-memory 1G \
> --executor-memory 2G \
> --executor-cores 2 \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.dynamicAllocation.initialExecutors=8 \
>
>
>
> On Tue, Jul 12, 2016 at 1:27 PM, Anuj Kumar <anujs...@gmail.com> wrote:
>
>> That configuration looks bad. With only two cores in use and 1GB used by
>> the app. Few points-
>>
>> 1. Please oversubscribe those CPUs to at-least twice the amount of cores
>> you have to start-with and then tune if it freezes
>> 2. Allocate all of the CPU cores and memory to your running app (I assume
>> it is your test environment)
>> 3. Assuming that you are running a quad core machine if you define cores
>> as 8 for your workers you will get 56 cores (CPU threads)
>> 4. Also, it depends on the source from where you are reading the data. If
>> you are reading from HDFS, what is your block size and part count?
>> 5. You may also have to tune the timeouts and frame-size based on the
>> dataset and errors that you are facing
>>
>> We have run terasort with couple of high-end worker machines RW from HDFS
>> with 5-10 mount points allocated for HDFS and Spark local. We have used
>> multiple configuration, like-
>> 10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with
>> HDFS 512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe,
>> worked well.
>>
>> On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <kar...@bluedata.com>
>> wrote:
>>
>>> I am trying a run terasort in spark , for a 7 node cluster with only 10g
>>> of data and executors get lost with GC overhead limit exceeded error.
>>>
>>> This is what my cluster looks like -
>>>
>>>
>>>    - *Alive Workers:* 7
>>>    - *Cores in use:* 28 Total, 2 Used
>>>    - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
>>>    - *Applications:* 1 Running, 6 Completed
>>>    - *Drivers:* 0 Running, 0 Completed
>>>    - *Status:* ALIVE
>>>
>>> Each worker has 8 cores and 4GB memory.
>>>
>>> My questions is how do people running in production decide these
>>> properties -
>>>
>>> 1) --num-executors
>>> 2) --executor-cores
>>> 3) --executor-memory
>>> 4) num of partitions
>>> 5) spark.default.parallelism
>>>
>>> Thanks,
>>> Kartik
>>>
>>>
>>>
>>
>


-- 
---
Takeshi Yamamuro

Reply via email to