Hi all! I would like to kick of an effort to improve the documentation of the Flink Architecture and internals. This also means making the streaming architecture more prominent in the docs.
Being quite a sophisticated stack, we need to improve the presentation of how Flink works - to an extend necessary to use Flink (and to appreciate all the cool stuff that is happening). This should also come in handy with new contributors. As a general umbrella, we need to first decide where and how to organize the documentation. I would propose to put the bulk of the documentation into the Wiki. Create a dedicated section on Flink Internals and sub-pages for each component / topic. To the docs, we add a general overview from which we link into the Wiki. == These sections would go into the DOCS in the git repository == - Overview of Program, pre-flight phase (type extraction, optimizer), JobManager, TaskManager. Differences between streaming and batch. We can realize this through one very nice picture with few lines of text. - High level architecture stack, different program representations (API operators, common API DAG, optimizer DAG, parallel data flow (JobGraph / Execution Graph) - (maybe) Parallelism and scheduling. This seems to be paramount to understand for users. - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client) == These sections would go into the WIKI == - Project structure (maven projects, what is where, dependencies between projects) - Component overview -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache, Archiving) -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache) -> Involved Actor Systems / Actors / Messages - Details about submitting a job (library upload, job graph submission, execution graph setup, scheduling trigger) - Memory Management - Optimizer internals - Akka Setup specifics - Netty and pluggable data exchange strategies - Testing: Flink test clusters and unit test utilities - Developer How-To: Setting up Eclipse, IntelliJ, Travis - Step-by-step guide to add a new operator I will go ahead and stub some sections in the Wiki. As we discuss and agree/disagree with the outline, we can evolve the Wiki. Greetings, Stephan