I added a document for data exchange between tasks: https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
Feel free to edit. I plan to link the class names to the class files in github. On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzou...@apache.org> wrote: > +1 for the Wiki. > > When these have been stabilized we can move them to the docs if we decide > to do so. > > On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote: > >> I have put my suggested version of an outline for the docs into the wiki. >> Regardless where the docs end up (wiki or repository), we can use the wiki >> to outline the docs. >> >> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals >> >> Some pages contain some stub or outline, others are completely blank. >> >> Not a comple list. Additions are welcome. >> >> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote: >> >> > I think the Wiki has a much lower barrier of entry to fix docs, >> especially >> > for external people. The docs, with the Jekyll setup, is rather tricky. >> > I would very much like that all kinds of people contribute to the docs >> > about the internals, not just the usual three suspects that have done >> this >> > so far. >> > >> > Having a good landing page in the regular docs is exactly to not loose >> all >> > the people that do not look into a wiki. The overview pages for the >> > internals need to be good and accessible and nicely link to the wiki to >> > "forward" people there. >> > >> > The overhead of deciding what goes where should not be terribly large, >> in >> > my opinion, since there is no really "wrong" place to put it. >> > >> > >> > >> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <aljos...@apache.org> >> > wrote: >> > >> >> Why do you wan't to split stuff between the doc in the repository and >> >> the wiki. I for one would always be to lazy to check stuff in a wiki >> >> when there is also a documentation. Plus, this would lead to >> >> additional overhead in deciding what goes where and syncing between >> >> the two places for documentation. >> >> >> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> >> wrote: >> >> > Ah, I totally forgot to add to the internals: >> >> > >> >> > - Fault tolerance in Batch mode >> >> > >> >> > - Fault Tolerance in Streaming Mode, with state handling >> >> > >> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> >> wrote: >> >> > >> >> >> Hi all! >> >> >> >> >> >> I would like to kick of an effort to improve the documentation of >> the >> >> >> Flink Architecture and internals. This also means making the >> streaming >> >> >> architecture more prominent in the docs. >> >> >> >> >> >> Being quite a sophisticated stack, we need to improve the >> presentation >> >> of >> >> >> how Flink works - to an extend necessary to use Flink (and to >> >> appreciate >> >> >> all the cool stuff that is happening). This should also come in >> handy >> >> with >> >> >> new contributors. >> >> >> >> >> >> As a general umbrella, we need to first decide where and how to >> >> organize >> >> >> the documentation. >> >> >> >> >> >> I would propose to put the bulk of the documentation into the Wiki. >> >> Create >> >> >> a dedicated section on Flink Internals and sub-pages for each >> >> component / >> >> >> topic. To the docs, we add a general overview from which we link >> into >> >> the >> >> >> Wiki. >> >> >> >> >> >> >> >> >> == These sections would go into the DOCS in the git repository == >> >> >> >> >> >> - Overview of Program, pre-flight phase (type extraction, >> optimizer), >> >> >> JobManager, TaskManager. Differences between streaming and batch. We >> >> can >> >> >> realize this through one very nice picture with few lines of text. >> >> >> >> >> >> - High level architecture stack, different program representations >> >> (API >> >> >> operators, common API DAG, optimizer DAG, parallel data flow >> (JobGraph >> >> / >> >> >> Execution Graph) >> >> >> >> >> >> - (maybe) Parallelism and scheduling. This seems to be paramount >> to >> >> >> understand for users. >> >> >> >> >> >> - Processes (JobManager, TaskManager, Webserver, WebClient, CLI >> >> client) >> >> >> >> >> >> >> >> >> >> >> >> == These sections would go into the WIKI == >> >> >> >> >> >> - Project structure (maven projects, what is where, dependencies >> >> between >> >> >> projects) >> >> >> >> >> >> - Component overview >> >> >> >> >> >> -> JobManager (InstanceManager, Scheduler, BLOB server, Library >> >> Cache, >> >> >> Archiving) >> >> >> >> >> >> -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library >> >> Cache) >> >> >> >> >> >> -> Involved Actor Systems / Actors / Messages >> >> >> >> >> >> - Details about submitting a job (library upload, job graph >> >> submission, >> >> >> execution graph setup, scheduling trigger) >> >> >> >> >> >> - Memory Management >> >> >> >> >> >> - Optimizer internals >> >> >> >> >> >> - Akka Setup specifics >> >> >> >> >> >> - Netty and pluggable data exchange strategies >> >> >> >> >> >> - Testing: Flink test clusters and unit test utilities >> >> >> >> >> >> - Developer How-To: Setting up Eclipse, IntelliJ, Travis >> >> >> >> >> >> - Step-by-step guide to add a new operator >> >> >> >> >> >> >> >> >> I will go ahead and stub some sections in the Wiki. >> >> >> >> >> >> As we discuss and agree/disagree with the outline, we can evolve the >> >> Wiki. >> >> >> >> >> >> Greetings, >> >> >> Stephan >> >> >> >> >> >> >> >> >> > >> > >> > >