Hi, Have you see a slide in spark summit 2016? https://spark-summit.org/2016/events/top-5-mistakes-when-writing-spark-applications/ This is a good slide for your capacity planning.
// maropu On Tue, Jul 12, 2016 at 2:31 PM, Yash Sharma <yash...@gmail.com> wrote: > I would say use the dynamic allocation rather than number of executors. > Provide some executor memory which you would like. > Deciding the values requires couple of test runs and checking what works > best for you. > > You could try something like - > > --driver-memory 1G \ > --executor-memory 2G \ > --executor-cores 2 \ > --conf spark.dynamicAllocation.enabled=true \ > --conf spark.dynamicAllocation.initialExecutors=8 \ > > > > On Tue, Jul 12, 2016 at 1:27 PM, Anuj Kumar <anujs...@gmail.com> wrote: > >> That configuration looks bad. With only two cores in use and 1GB used by >> the app. Few points- >> >> 1. Please oversubscribe those CPUs to at-least twice the amount of cores >> you have to start-with and then tune if it freezes >> 2. Allocate all of the CPU cores and memory to your running app (I assume >> it is your test environment) >> 3. Assuming that you are running a quad core machine if you define cores >> as 8 for your workers you will get 56 cores (CPU threads) >> 4. Also, it depends on the source from where you are reading the data. If >> you are reading from HDFS, what is your block size and part count? >> 5. You may also have to tune the timeouts and frame-size based on the >> dataset and errors that you are facing >> >> We have run terasort with couple of high-end worker machines RW from HDFS >> with 5-10 mount points allocated for HDFS and Spark local. We have used >> multiple configuration, like- >> 10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with >> HDFS 512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe, >> worked well. >> >> On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <kar...@bluedata.com> >> wrote: >> >>> I am trying a run terasort in spark , for a 7 node cluster with only 10g >>> of data and executors get lost with GC overhead limit exceeded error. >>> >>> This is what my cluster looks like - >>> >>> >>> - *Alive Workers:* 7 >>> - *Cores in use:* 28 Total, 2 Used >>> - *Memory in use:* 56.0 GB Total, 1024.0 MB Used >>> - *Applications:* 1 Running, 6 Completed >>> - *Drivers:* 0 Running, 0 Completed >>> - *Status:* ALIVE >>> >>> Each worker has 8 cores and 4GB memory. >>> >>> My questions is how do people running in production decide these >>> properties - >>> >>> 1) --num-executors >>> 2) --executor-cores >>> 3) --executor-memory >>> 4) num of partitions >>> 5) spark.default.parallelism >>> >>> Thanks, >>> Kartik >>> >>> >>> >> > -- --- Takeshi Yamamuro