Re: scaling question

Bill Sparks Fri, 19 Jun 2015 07:50:11 -0700

To clarify Š it's 64HT cores per node, 16 nodes each with 128GB. Well
actually I have 48 nodes Š but trying to limit it so we have a comparison
with Spark/MPI/MapReduce all at the same node count.


Thanks for the information.

-- 
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.





On 6/19/15 9:44 AM, "Ufuk Celebi" <[email protected]> wrote:

>PS: I've read your last email as 64 HT cores per machine. If it was in
>total over the 16 nodes, you have to adjust my response accordingly. ;)
>
>On 19 Jun 2015, at 16:42, Fabian Hueske <[email protected]> wrote:
>
>> Hi Bill,
>> 
>> no worry, questions are the purpose of this mailing list.
>> 
>> The number network buffers is a parameter that needs to be scaled with
>>your setup. The reason for that is Flink's pipelined data transfer,
>>which requires a certain number of network buffers to be available at
>>the same time during processing.
>> 
>> There is an FAQ entry that explains how to set this parameter according
>>to your setup:
>> --> 
>>http://flink.apache.org/faq.html#i-get-an-error-message-saying-that-not-e
>>nough-buffers-are-available-how-do-i-fix-this
>> 
>> The documentation for parallel execution can be found here:
>> 
>>http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_gu
>>ide.html#parallel-execution
>> 
>> If you are working on the latest snapshot you can also configure Flink
>>to use batched data transfer instead of pipelined transfer. This is done
>>via the ExecutionConfig.setExecutionMode(), which you obtain by calling
>>getConfig() on your ExecutionEnvironment.
>> 
>> Best, Fabian
>> 
>> 
>> 2015-06-19 16:31 GMT+02:00 Maximilian Michels <[email protected]>:
>> Hi Bill,
>> 
>> You're right. Simply increasing the task manager slots doesn't do
>>anything. It is correct to set the parallelism to taskManagers*slots.
>>Simply increase the number of network buffers in the flink-conf.yaml,
>>e.g. to 4096. In the future, we will configure this setting dynamically.
>> 
>> Let us know if your runtime decreases :)
>> 
>> Cheers,
>> Max
>> 
>> On Fri, Jun 19, 2015 at 4:24 PM, Bill Sparks <[email protected]> wrote:
>> 
>> Sorry for the post again. I guess I'm not understanding thisŠ
>> 
>> The question is how to scale up/increase the execution of a problem.
>>What  I'm trying to do, is get the best out of the available processors
>>for a given node count and compare this against spark, using KMeans.
>> 
>> For spark,  one method is to increase the executors and RDD partitions
>>- for Flink I can increase the number of task slots
>>(taskmanager.numberOfTaskSlots). My empirical evidence suggests that
>>just increasing the slots does not increase processing of the data. Is
>>there something I'm missing? Much like spark with re-partitioning your
>>datasets, is there an equivalent option for flink? What about the
>>parallelism argument The referring document seems to be brokenŠ
>> 
>> This seems to be a dead link:
>>https://github.com/apache/flink/blob/master/docs/setup/%7B%7Bsite.baseurl
>>%7D%7D/apis/programming_guide.html#parallel-execution
>> 
>> If I do increase the parallelism to be (taskManagers*slots) I hit the
>>"Insufficient number of network buffersŠ"
>> 
>> I have 16 nodes (64 HT cores), and have run TaskSlots from 1, 4, 8, 16
>>and still the execution time is always around 5-6 minutes, using the
>>default parallelism.
>> 
>> Regards,
>>     Bill
>> -- 
>> Jonathan (Bill) Sparks
>> Software Architecture
>> Cray Inc.
>> 
>> 
>

Re: scaling question

Reply via email to