Re: Improve the documentation of the Flink Architecture and internals

Ufuk Celebi Fri, 20 Mar 2015 04:49:44 -0700

Thanks. I will have a look later :-)

+1 for the Wiki. I think the low overhead does not only make it easier to 
contribute for newcomers, but for committers as well. :-)


On 20 Mar 2015, at 12:46, Kostas Tzoumas <ktzou...@apache.org> wrote:

> I added a document for data exchange between tasks:
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> 
> Feel free to edit. I plan to link the class names to the class files in
> github.
> 
> On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <ktzou...@apache.org>
> wrote:
> 
>> +1 for the Wiki.
>> 
>> When these have been stabilized we can move them to the docs if we decide
>> to do so.
>> 
>> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:
>> 
>>> I have put my suggested version of an outline for the docs into the wiki.
>>> Regardless where the docs end up (wiki or repository), we can use the wiki
>>> to outline the docs.
>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>> 
>>> Some pages contain some stub or outline, others are completely blank.
>>> 
>>> Not a comple list. Additions are welcome.
>>> 
>>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>>> 
>>>> I think the Wiki has a much lower barrier of entry to fix docs,
>>> especially
>>>> for external people. The docs, with the Jekyll setup, is rather tricky.
>>>> I would very much like that all kinds of people contribute to the docs
>>>> about the internals, not just the usual three suspects that have done
>>> this
>>>> so far.
>>>> 
>>>> Having a good landing page in the regular docs is exactly to not loose
>>> all
>>>> the people that do not look into a wiki. The overview pages for the
>>>> internals need to be good and accessible and nicely link to the wiki to
>>>> "forward" people there.
>>>> 
>>>> The overhead of deciding what goes where should not be terribly large,
>>> in
>>>> my opinion, since there is no really "wrong" place to put it.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <aljos...@apache.org>
>>>> wrote:
>>>> 
>>>>> Why do you wan't to split stuff between the doc in the repository and
>>>>> the wiki. I for one would always be to lazy to check stuff in a wiki
>>>>> when there is also a documentation. Plus, this would lead to
>>>>> additional overhead in deciding what goes where and syncing between
>>>>> the two places for documentation.
>>>>> 
>>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>>>>> Ah, I totally forgot to add to the internals:
>>>>>> 
>>>>>>  - Fault tolerance in Batch mode
>>>>>> 
>>>>>>  - Fault Tolerance in Streaming Mode, with state handling
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> Hi all!
>>>>>>> 
>>>>>>> I would like to kick of an effort to improve the documentation of
>>> the
>>>>>>> Flink Architecture and internals. This also means making the
>>> streaming
>>>>>>> architecture more prominent in the docs.
>>>>>>> 
>>>>>>> Being quite a sophisticated stack, we need to improve the
>>> presentation
>>>>> of
>>>>>>> how Flink works - to an extend necessary to use Flink (and to
>>>>> appreciate
>>>>>>> all the cool stuff that is happening). This should also come in
>>> handy
>>>>> with
>>>>>>> new contributors.
>>>>>>> 
>>>>>>> As a general umbrella, we need to first decide where and how to
>>>>> organize
>>>>>>> the documentation.
>>>>>>> 
>>>>>>> I would propose to put the bulk of the documentation into the Wiki.
>>>>> Create
>>>>>>> a dedicated section on Flink Internals and sub-pages for each
>>>>> component /
>>>>>>> topic. To the docs, we add a general overview from which we link
>>> into
>>>>> the
>>>>>>> Wiki.
>>>>>>> 
>>>>>>> 
>>>>>>> == These sections would go into the DOCS in the git repository ==
>>>>>>> 
>>>>>>>  - Overview of Program, pre-flight phase (type extraction,
>>> optimizer),
>>>>>>> JobManager, TaskManager. Differences between streaming and batch. We
>>>>> can
>>>>>>> realize this through one very nice picture with few lines of text.
>>>>>>> 
>>>>>>>  - High level architecture stack, different program representations
>>>>> (API
>>>>>>> operators, common API DAG, optimizer DAG, parallel data flow
>>> (JobGraph
>>>>> /
>>>>>>> Execution Graph)
>>>>>>> 
>>>>>>>  - (maybe) Parallelism and scheduling. This seems to be paramount
>>> to
>>>>>>> understand for users.
>>>>>>> 
>>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>>>>> client)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> == These sections would go into the WIKI ==
>>>>>>> 
>>>>>>>  - Project structure (maven projects, what is where, dependencies
>>>>> between
>>>>>>> projects)
>>>>>>> 
>>>>>>>  - Component overview
>>>>>>> 
>>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>>>>> Cache,
>>>>>>> Archiving)
>>>>>>> 
>>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>>>>> Cache)
>>>>>>> 
>>>>>>>    -> Involved Actor Systems / Actors / Messages
>>>>>>> 
>>>>>>>  - Details about submitting a job (library upload, job graph
>>>>> submission,
>>>>>>> execution graph setup, scheduling trigger)
>>>>>>> 
>>>>>>>  - Memory Management
>>>>>>> 
>>>>>>>  - Optimizer internals
>>>>>>> 
>>>>>>>  - Akka Setup specifics
>>>>>>> 
>>>>>>>  - Netty and pluggable data exchange strategies
>>>>>>> 
>>>>>>>  - Testing: Flink test clusters and unit test utilities
>>>>>>> 
>>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>>>>>> 
>>>>>>>  - Step-by-step guide to add a new operator
>>>>>>> 
>>>>>>> 
>>>>>>> I will go ahead and stub some sections in the Wiki.
>>>>>>> 
>>>>>>> As we discuss and agree/disagree with the outline, we can evolve the
>>>>> Wiki.
>>>>>>> 
>>>>>>> Greetings,
>>>>>>> Stephan
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>>

Re: Improve the documentation of the Flink Architecture and internals

Reply via email to