Thanks. I will have a look later :-) +1 for the Wiki. I think the low overhead does not only make it easier to contribute for newcomers, but for committers as well. :-)
On 20 Mar 2015, at 12:46, Kostas Tzoumas <ktzou...@apache.org> wrote: > I added a document for data exchange between tasks: > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks > > Feel free to edit. I plan to link the class names to the class files in > github. > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzou...@apache.org> > wrote: > >> +1 for the Wiki. >> >> When these have been stabilized we can move them to the docs if we decide >> to do so. >> >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote: >> >>> I have put my suggested version of an outline for the docs into the wiki. >>> Regardless where the docs end up (wiki or repository), we can use the wiki >>> to outline the docs. >>> >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals >>> >>> Some pages contain some stub or outline, others are completely blank. >>> >>> Not a comple list. Additions are welcome. >>> >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>>> I think the Wiki has a much lower barrier of entry to fix docs, >>> especially >>>> for external people. The docs, with the Jekyll setup, is rather tricky. >>>> I would very much like that all kinds of people contribute to the docs >>>> about the internals, not just the usual three suspects that have done >>> this >>>> so far. >>>> >>>> Having a good landing page in the regular docs is exactly to not loose >>> all >>>> the people that do not look into a wiki. The overview pages for the >>>> internals need to be good and accessible and nicely link to the wiki to >>>> "forward" people there. >>>> >>>> The overhead of deciding what goes where should not be terribly large, >>> in >>>> my opinion, since there is no really "wrong" place to put it. >>>> >>>> >>>> >>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <aljos...@apache.org> >>>> wrote: >>>> >>>>> Why do you wan't to split stuff between the doc in the repository and >>>>> the wiki. I for one would always be to lazy to check stuff in a wiki >>>>> when there is also a documentation. Plus, this would lead to >>>>> additional overhead in deciding what goes where and syncing between >>>>> the two places for documentation. >>>>> >>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> >>> wrote: >>>>>> Ah, I totally forgot to add to the internals: >>>>>> >>>>>> - Fault tolerance in Batch mode >>>>>> >>>>>> - Fault Tolerance in Streaming Mode, with state handling >>>>>> >>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> >>> wrote: >>>>>> >>>>>>> Hi all! >>>>>>> >>>>>>> I would like to kick of an effort to improve the documentation of >>> the >>>>>>> Flink Architecture and internals. This also means making the >>> streaming >>>>>>> architecture more prominent in the docs. >>>>>>> >>>>>>> Being quite a sophisticated stack, we need to improve the >>> presentation >>>>> of >>>>>>> how Flink works - to an extend necessary to use Flink (and to >>>>> appreciate >>>>>>> all the cool stuff that is happening). This should also come in >>> handy >>>>> with >>>>>>> new contributors. >>>>>>> >>>>>>> As a general umbrella, we need to first decide where and how to >>>>> organize >>>>>>> the documentation. >>>>>>> >>>>>>> I would propose to put the bulk of the documentation into the Wiki. >>>>> Create >>>>>>> a dedicated section on Flink Internals and sub-pages for each >>>>> component / >>>>>>> topic. To the docs, we add a general overview from which we link >>> into >>>>> the >>>>>>> Wiki. >>>>>>> >>>>>>> >>>>>>> == These sections would go into the DOCS in the git repository == >>>>>>> >>>>>>> - Overview of Program, pre-flight phase (type extraction, >>> optimizer), >>>>>>> JobManager, TaskManager. Differences between streaming and batch. We >>>>> can >>>>>>> realize this through one very nice picture with few lines of text. >>>>>>> >>>>>>> - High level architecture stack, different program representations >>>>> (API >>>>>>> operators, common API DAG, optimizer DAG, parallel data flow >>> (JobGraph >>>>> / >>>>>>> Execution Graph) >>>>>>> >>>>>>> - (maybe) Parallelism and scheduling. This seems to be paramount >>> to >>>>>>> understand for users. >>>>>>> >>>>>>> - Processes (JobManager, TaskManager, Webserver, WebClient, CLI >>>>> client) >>>>>>> >>>>>>> >>>>>>> >>>>>>> == These sections would go into the WIKI == >>>>>>> >>>>>>> - Project structure (maven projects, what is where, dependencies >>>>> between >>>>>>> projects) >>>>>>> >>>>>>> - Component overview >>>>>>> >>>>>>> -> JobManager (InstanceManager, Scheduler, BLOB server, Library >>>>> Cache, >>>>>>> Archiving) >>>>>>> >>>>>>> -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library >>>>> Cache) >>>>>>> >>>>>>> -> Involved Actor Systems / Actors / Messages >>>>>>> >>>>>>> - Details about submitting a job (library upload, job graph >>>>> submission, >>>>>>> execution graph setup, scheduling trigger) >>>>>>> >>>>>>> - Memory Management >>>>>>> >>>>>>> - Optimizer internals >>>>>>> >>>>>>> - Akka Setup specifics >>>>>>> >>>>>>> - Netty and pluggable data exchange strategies >>>>>>> >>>>>>> - Testing: Flink test clusters and unit test utilities >>>>>>> >>>>>>> - Developer How-To: Setting up Eclipse, IntelliJ, Travis >>>>>>> >>>>>>> - Step-by-step guide to add a new operator >>>>>>> >>>>>>> >>>>>>> I will go ahead and stub some sections in the Wiki. >>>>>>> >>>>>>> As we discuss and agree/disagree with the outline, we can evolve the >>>>> Wiki. >>>>>>> >>>>>>> Greetings, >>>>>>> Stephan >>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> >>