Hi Konstantin,

Thanks a lot for bringing up this discussion.

Currently, the Python documentation is more like a mixture of Option 1 and
Option 2. It contains two parts:
1) The first part is the independent page [1] which could be seen as the
main entrypoint for Python users.
2) The second part is the Python tabs which are among the DataStream API /
Table API pages.

The motivation to provide an independent page for Python documentation is
as follows:
1) We are trying to create a Pythonic documentation for Python users (we
are still far away from that and I have received much feedback saying that
the Python documentation and API is too Java-like). However, to avoid
duplication, it will link to the DataStream API / Table API pages when
necessary instead of copying content. There are indeed exceptions, e.g. the
window example given by Jark, that's because it only provides a very
limited window support in Python DataStream API at present and to give
Python users a complete picture of what they can do in Python DataStream
API, we have added a dedicated page. We are trying to finalize the window
support in 1.16 [2] and remove the duplicate documentation.
2) There are some kinds of documentations which are only applicable for
Python language, e.g. dependency management[2], conversion between Table
and Pandas DataFrame [3], etc. Providing an independent page helps to
provide a place to hold all these kinds of documentation together.

Regarding Option 1: "Language Tabs", this makes it hard to create Pythonic
documentation for Python users.
Regarding Option 2: "Language First", it may mean a lot of duplications.
Currently, there are a lot of descriptions in the DataStream API / Table
API pages which are shared between Java/Scala/Python.

> In the rest of the documentation, Python is sometimes
> included like in this Table API page [2] and sometimes ignored like on the
> project setup pages [3].
I agree that this is something that we need to improve.

Regards,
Dian

[1]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/overview/
[2] https://issues.apache.org/jira/browse/FLINK-26477
[2]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/dependency_management/
[3]
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/conversion_of_pandas/

On Wed, Mar 23, 2022 at 4:17 PM Jark Wu <imj...@gmail.com> wrote:

> Hi Konstantin,
>
> Thanks for starting this discussion.
>
> From my perspective, I prefer the "Language Tabs" approach.
> But maybe we can improve the tabs to move to the sidebar or top menu,
> which allows users to first decide on their language and then the API.
> IMO, programming languages are just like spoken languages which can be
> picked in the sidebar.
> What I want to avoid is the duplicate docs and in-complete features in a
> specific language.
> "Language First" may confuse users about what is and where to find the
> complete features provided by flink.
>
> For example, there are a lot of duplications in the "Window" pages[1] and
> "Python Window" pages[2].
> And users can't have a complete overview of Flink's window mechanism from
> the Python API part.
> Users have to go through the Java/Scala DataStream API first to build the
> overall knowledge,
> and then to read the Python API part.
>
> > * Second, most of the Flink Documentation currently is using a "Language
> Tabs" approach, but this might become obsolete in the long-term anyway as
> we move more and more in a Scala-free direction.
>
> The Scala-free direction means users can pick arbitrary Scala versions, not
> drop the Scala API.
> So the "Language Tabs" is still necessary and helpful for switching
> languages.
>
> Best,
> Jark
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/datastream/operators/windows/
> [2]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/
>
>
>
>
>
>
>
> On Tue, 22 Mar 2022 at 21:40, Konstantin Knauf <kna...@apache.org> wrote:
>
> > Hi everyone,
> >
> > I would like to discuss a particular aspect of our documentation: the
> > top-level structure with respect to languages and APIs. The current
> > structure is inconsistent and the direction is unclear to me, which makes
> > it hard for me to contribute gradual improvements.
> >
> > Currently, the Python documentation has its own independent branch in the
> > documentation [1]. In the rest of the documentation, Python is sometimes
> > included like in this Table API page [2] and sometimes ignored like on
> the
> > project setup pages [3]. Scala and Java on the other hand are always
> > documented in parallel next to each other in tabs.
> >
> > The way I see it, most parts (application development, connectors,
> getting
> > started, project setup) of our documentation have two primary dimensions:
> > API (DataStream, Table API), Language (Python, Java, Scala)
> >
> > In addition, there is SQL, for which the language is only a minor factor
> > (UDFs), but which generally requires a different structure (different
> > audience, different tools). On the other hand, SQL and Table API have
> some
> > conceptual overlap, whereas I doubt these concepts are of big interest
> > to SQL users. So, to me SQL should be treated separately in any case with
> > links to the Table API documentation for some concepts.
> >
> > I think, in general, both approaches can work:
> >
> >
> > *Option 1: "Language Tabs"*
> > Application Development
> > > DataStream API  (Java, Scala, Python)
> > > Table API (Java, Scala, Python)
> > > SQL
> >
> >
> > *Option 2: "Language First" *
> > Java Development Guide
> > > Getting Started
> > > DataStream API
> > > Table API
> > Python Development Guide
> > > Getting Started
> > > Datastream API
> > > Table API
> > SQL Development Guide
> >
> > I don't have a strong opinion on this, but tend towards "Language First".
> >
> > * First, I assume, users actually first decide on their language/tools of
> > choice and then move on to the API.
> >
> > * Second, most of the Flink Documentation currently is using a "Language
> > Tabs" approach, but this might become obsolete in the long-term anyway as
> > we move more and more in a Scala-free direction.
> >
> > For the connectors, I think, there is a good argument for "Language & API
> > Embedded", because documenting every connector for each API and language
> > separately would result in a lot of duplication. Here, I would go one
> step
> > further then what we have right now and target
> >
> > Connectors
> > -> Kafka (All APIs incl. SQL, All Languages)
> > -> Kinesis (same)
> > -> ...
> >
> > This also results in a quick overview for users about which connectors
> > exist and plays well with our plan of externalizing connectors.
> >
> > For completeness & scope of the discussion: there are two outdated FLIPs
> on
> > documentation (42, 60), which both have not been implemented, are
> partially
> > contradicting each other and are generally out-of-date. I specifically
> > don't intend to add another FLIP to this graveyard, but still reach a
> > consensus on the high-level direction.
> >
> > What do you think?
> >
> > Cheers,
> >
> > Konstantin
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>

Reply via email to