Re: Improve the documentation of the Flink Architecture and internals

Kostas Tzoumas Fri, 20 Mar 2015 04:47:42 -0700

I added a document for data exchange between tasks:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks


Feel free to edit. I plan to link the class names to the class files in
github.

On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzou...@apache.org>
wrote:

> +1 for the Wiki.
>
> When these have been stabilized we can move them to the docs if we decide
> to do so.
>
> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> I have put my suggested version of an outline for the docs into the wiki.
>> Regardless where the docs end up (wiki or repository), we can use the wiki
>> to outline the docs.
>>
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>
>> Some pages contain some stub or outline, others are completely blank.
>>
>> Not a comple list. Additions are welcome.
>>
>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>> > I think the Wiki has a much lower barrier of entry to fix docs,
>> especially
>> > for external people. The docs, with the Jekyll setup, is rather tricky.
>> > I would very much like that all kinds of people contribute to the docs
>> > about the internals, not just the usual three suspects that have done
>> this
>> > so far.
>> >
>> > Having a good landing page in the regular docs is exactly to not loose
>> all
>> > the people that do not look into a wiki. The overview pages for the
>> > internals need to be good and accessible and nicely link to the wiki to
>> > "forward" people there.
>> >
>> > The overhead of deciding what goes where should not be terribly large,
>> in
>> > my opinion, since there is no really "wrong" place to put it.
>> >
>> >
>> >
>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <aljos...@apache.org>
>> > wrote:
>> >
>> >> Why do you wan't to split stuff between the doc in the repository and
>> >> the wiki. I for one would always be to lazy to check stuff in a wiki
>> >> when there is also a documentation. Plus, this would lead to
>> >> additional overhead in deciding what goes where and syncing between
>> >> the two places for documentation.
>> >>
>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>> >> > Ah, I totally forgot to add to the internals:
>> >> >
>> >> >   - Fault tolerance in Batch mode
>> >> >
>> >> >   - Fault Tolerance in Streaming Mode, with state handling
>> >> >
>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>> >> >
>> >> >> Hi all!
>> >> >>
>> >> >> I would like to kick of an effort to improve the documentation of
>> the
>> >> >> Flink Architecture and internals. This also means making the
>> streaming
>> >> >> architecture more prominent in the docs.
>> >> >>
>> >> >> Being quite a sophisticated stack, we need to improve the
>> presentation
>> >> of
>> >> >> how Flink works - to an extend necessary to use Flink (and to
>> >> appreciate
>> >> >> all the cool stuff that is happening). This should also come in
>> handy
>> >> with
>> >> >> new contributors.
>> >> >>
>> >> >> As a general umbrella, we need to first decide where and how to
>> >> organize
>> >> >> the documentation.
>> >> >>
>> >> >> I would propose to put the bulk of the documentation into the Wiki.
>> >> Create
>> >> >> a dedicated section on Flink Internals and sub-pages for each
>> >> component /
>> >> >> topic. To the docs, we add a general overview from which we link
>> into
>> >> the
>> >> >> Wiki.
>> >> >>
>> >> >>
>> >> >>  == These sections would go into the DOCS in the git repository ==
>> >> >>
>> >> >>   - Overview of Program, pre-flight phase (type extraction,
>> optimizer),
>> >> >> JobManager, TaskManager. Differences between streaming and batch. We
>> >> can
>> >> >> realize this through one very nice picture with few lines of text.
>> >> >>
>> >> >>   - High level architecture stack, different program representations
>> >> (API
>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
>> (JobGraph
>> >> /
>> >> >> Execution Graph)
>> >> >>
>> >> >>   - (maybe) Parallelism and scheduling. This seems to be paramount
>> to
>> >> >> understand for users.
>> >> >>
>> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>> >> client)
>> >> >>
>> >> >>
>> >> >>
>> >> >>  == These sections would go into the WIKI ==
>> >> >>
>> >> >>   - Project structure (maven projects, what is where, dependencies
>> >> between
>> >> >> projects)
>> >> >>
>> >> >>   - Component overview
>> >> >>
>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>> >> Cache,
>> >> >> Archiving)
>> >> >>
>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>> >> Cache)
>> >> >>
>> >> >>     -> Involved Actor Systems / Actors / Messages
>> >> >>
>> >> >>   - Details about submitting a job (library upload, job graph
>> >> submission,
>> >> >> execution graph setup, scheduling trigger)
>> >> >>
>> >> >>   - Memory Management
>> >> >>
>> >> >>   - Optimizer internals
>> >> >>
>> >> >>   - Akka Setup specifics
>> >> >>
>> >> >>   - Netty and pluggable data exchange strategies
>> >> >>
>> >> >>   - Testing: Flink test clusters and unit test utilities
>> >> >>
>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>> >> >>
>> >> >>   - Step-by-step guide to add a new operator
>> >> >>
>> >> >>
>> >> >> I will go ahead and stub some sections in the Wiki.
>> >> >>
>> >> >> As we discuss and agree/disagree with the outline, we can evolve the
>> >> Wiki.
>> >> >>
>> >> >> Greetings,
>> >> >> Stephan
>> >> >>
>> >> >>
>> >>
>> >
>> >
>>
>
>

Re: Improve the documentation of the Flink Architecture and internals

Reply via email to