Hi, It seems your questions are too abstract & theoretical. The answer is : it depends on several factors. Skewness in data, data volume, reliability requirements, "fatness" of servers, whether one performs look-up in other data sources, etc. The papers you mentioned mean the following: under concrete & specific conditions, researchers achieved their results. If they had changed some parameters slightly (increase network's throughput, for example, or change garbage collector's options) , the results would have been completely different.
On Tuesday, January 3, 2017, Hanna Prinz <hanna_pr...@yahoo.de> wrote: > Happy new year everyone :) > > I’m currently working on a paper about Flink. I already got some > recommendations on general papers with details about Flink, which helped me > a lot already. But now that I read them,* I’m further interested is the > speedup capabilities, provided by the Flink Framework: How „far“ can it > scale efficiently?* > > Amdahls law states that a parallelization is only efficient as long as the > non-parallelizable part of the processing (time for the communication > between the nodes etc.) doesn’t „eat up“ the speed gains of parallelization > (= parallel slowdown). > Of course, the communication overhead is mostly caused by the > implementation, but the frameworks specific solution for the communication > between the nodes has a reasonable effect as well. > > After studying these papers, it looks like, although Flinks performance is > better in many cases, the possible speedup is equal to the possible speedup > of Spark. > > 1. Spark versus Flink - Understanding Performance in Big Data Analytics > Frameworks | https://hal.inria.fr/hal-01347638/document > 2. Big Data Analytics on Cray XC Series DataWarp using Hadoop, Spark and > Flink | https://cug.org/proceedings/cug2016_proceedings/includes/ > files/pap141.pdf > 3. Thrill - High-Performance Algorithmic Distributed Batch Data Processing > with C++ | https://panthema.net/2016/0816-Thrill-High-Performance- > Algorithmic-Distributed-Batch-Data-Processing-with-CPP/1608.05634v1.pdf > > > Does someone have … > … more information (or data) on speedup of Flink applications? > … experience (or data) with Flink in an extremely paralellized environment? > … detailed information on how the nodes communicate, especially when they > are waiting for task results of one another? > > Thank you very much for your time & answers! > Hanna >