Why do you wan't to split stuff between the doc in the repository and the wiki. I for one would always be to lazy to check stuff in a wiki when there is also a documentation. Plus, this would lead to additional overhead in deciding what goes where and syncing between the two places for documentation.
On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> wrote: > Ah, I totally forgot to add to the internals: > > - Fault tolerance in Batch mode > > - Fault Tolerance in Streaming Mode, with state handling > > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> wrote: > >> Hi all! >> >> I would like to kick of an effort to improve the documentation of the >> Flink Architecture and internals. This also means making the streaming >> architecture more prominent in the docs. >> >> Being quite a sophisticated stack, we need to improve the presentation of >> how Flink works - to an extend necessary to use Flink (and to appreciate >> all the cool stuff that is happening). This should also come in handy with >> new contributors. >> >> As a general umbrella, we need to first decide where and how to organize >> the documentation. >> >> I would propose to put the bulk of the documentation into the Wiki. Create >> a dedicated section on Flink Internals and sub-pages for each component / >> topic. To the docs, we add a general overview from which we link into the >> Wiki. >> >> >> == These sections would go into the DOCS in the git repository == >> >> - Overview of Program, pre-flight phase (type extraction, optimizer), >> JobManager, TaskManager. Differences between streaming and batch. We can >> realize this through one very nice picture with few lines of text. >> >> - High level architecture stack, different program representations (API >> operators, common API DAG, optimizer DAG, parallel data flow (JobGraph / >> Execution Graph) >> >> - (maybe) Parallelism and scheduling. This seems to be paramount to >> understand for users. >> >> - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client) >> >> >> >> == These sections would go into the WIKI == >> >> - Project structure (maven projects, what is where, dependencies between >> projects) >> >> - Component overview >> >> -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache, >> Archiving) >> >> -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache) >> >> -> Involved Actor Systems / Actors / Messages >> >> - Details about submitting a job (library upload, job graph submission, >> execution graph setup, scheduling trigger) >> >> - Memory Management >> >> - Optimizer internals >> >> - Akka Setup specifics >> >> - Netty and pluggable data exchange strategies >> >> - Testing: Flink test clusters and unit test utilities >> >> - Developer How-To: Setting up Eclipse, IntelliJ, Travis >> >> - Step-by-step guide to add a new operator >> >> >> I will go ahead and stub some sections in the Wiki. >> >> As we discuss and agree/disagree with the outline, we can evolve the Wiki. >> >> Greetings, >> Stephan >> >>