Hi Xingbo, Thanks for your information.
I think the PySpark's documentation redesigning deserves our attention. It seems that the Spark community has also begun to treat the user experience of Python documentation more seriously. We can continue to pay attention to the discussion and progress of the redesigning in the Spark community. It is so similar to our working that there should be some ideas worthy for us. Best, Wei > 在 2020年8月5日,15:02,Xingbo Huang <hxbks...@gmail.com> 写道: > > Hi, > > I found that the spark community is also working on redesigning pyspark > documentation[1] recently. Maybe we can compare the difference between our > document structure and its document structure. > > [1] https://issues.apache.org/jira/browse/SPARK-31851 > <https://issues.apache.org/jira/browse/SPARK-31851> > http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html > > <http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html> > > Best, > Xingbo > > David Anderson <da...@alpinegizmo.com <mailto:da...@alpinegizmo.com>> > 于2020年8月5日周三 上午3:17写道: > I'm delighted to see energy going into improving the documentation. > > With the current documentation, I get a lot of questions that I believe > reflect two fundamental problems with what we currently provide: > > (1) We have a lot of contextual information in our heads about how Flink > works, and we are able to use that knowledge to make reasonable inferences > about how things (probably) work in cases we aren't so familiar with. For > example, I get a lot of questions of the form "If I use <this feature> will I > still have exactly once guarantees?" The answer is always yes, but they > continue to have doubts because we have failed to clearly communicate this > fundamental, underlying principle. > > This specific example about fault tolerance applies across all of the Flink > docs, but the general idea can also be applied to the Table/SQL and PyFlink > docs. The guiding principles underlying these APIs should be written down in > one easy-to-find place. > > (2) The other kind of question I get a lot is "Can I do <X> with <Y>?" E.g., > "Can I use the JDBC table sink from PyFlink?" These questions can be very > difficult to answer because it is frequently the case that one has to reason > about why a given feature doesn't seem to appear in the documentation. It > could be that I'm looking in the wrong place, or it could be that someone > forgot to document something, or it could be that it can in fact be done by > applying a general mechanism in a specific way that I haven't thought of -- > as in this case, where one can use a JDBC sink from Python if one thinks to > use DDL. > > So I think it would be helpful to be explicit about both what is, and what is > not, supported in PyFlink. And to have some very clear organizing principles > in the documentation so that users can quickly learn where to look for > specific facts. > > Regards, > David > > > On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <sunjincheng...@gmail.com > <mailto:sunjincheng...@gmail.com>> wrote: > Hi Seth and David, > > I'm very happy to have your reply and suggestions. I would like to share my > thoughts here: > > The main motivation we want to refactor the PyFlink doc is that we want to > make sure that the Python users could find all they want starting from the > PyFlink documentation mainpage. That’s, the PyFlink documentation should have > a catalogue which includes all the functionalities available in PyFlink. > However, this doesn’t mean that we will make a copy of the content of the > documentation in the other places. It may be just a reference/link to the > other documentation if needed. For the documentation added under PyFlink > mainpage, the principle is that it should only include Python specific > content, instead of making a copy of the Java content. > > >> I'm concerned that this proposal duplicates a lot of content that will > >> quickly get out of sync. It feels like it is documenting PyFlink > >> separately from the rest of the project. > > Regarding the concerns about maintainability, as mentioned above, The goal of > this FLIP is to provide an intelligible entrance of Python API, and the > content in it should only contain the information which is useful for Python > users. There are indeed many agenda items that duplicate the Java documents > in this FLIP, but it doesn't mean the content would be copied from Java > documentation. i.e, if the content of the document is the same as the > corresponding Java document, we will add a link to the Java document. e.g. > the "Built-in functions" and "SQL". We only create a page for the Python-only > content, and then redirect to the Java document if there is something shared > with Java. e.g. "Connectors" and "Catalogs". If the document is Python-only > and already exists, we will move it from the old python document to the new > python document, e.g. "Configurations". If the document is Python-only and > not exists before, we will create a new page for it. e.g. "DataTypes". > > The main reason we create a new page for Python Data Types is that it is only > conceptually one-to-one correspondence with Java Data Types, but the actual > document content would be very different from Java DataTypes. Some detailed > difference are as following: > > - The text in the Java Data Types document is written for JVM-based > language users, which is incomprehensible to users who only understand > python. > - Currently the Python Data Types does not support the "bridgedTo" method, > DataTypes.RAW, DataTypes.NULL and User Defined Types. > - The section "Planner Compatibility" and "Data Type Extraction" are only > useful for Java/Scala users. > - We want to add sections which may only apply for Python such as which > Data Types are currently supported in Python, the mapping between DataType > and Python object type, etc. > > I think the root cause of such a difference with existing documents is that, > Python is the first non-JVM language we support in flink. This means our > previous method of sharing documents between Java and Scala may not be > suitable for Python. So we will adopt some very different methods to provide > documentation for Python users. Of course, we should reduce maintenance costs > as much as possible while ensuring user experience. Furthermore, python is > the first step of flink multi-language support, and there may be R, Go, etc > in future. it is very necessary for us to form main page for each language, > so that users of each type of language can focus on the content which they > care about. > > >> Things like the cookbook and tutorial should be under the Try Flink > >> section of the documentation. > > Regarding the position of the "Cookbook" section, in my sense the "Try Flink" > is for the new users and the "Cookbook" is for more advanced users, i.e., In > “Try Flink” can be the simplest end-to-end example, such as “Hello World” and > In “Cookbook” we can add more use cases closer to production business, Such > as, CDN log analysis, PV / UV of e-commerce. So I prefer to keep the current > structure. > > >> it's relatively straightforward to compare the Python API with the Java > >> and Scala versions. > > Regarding the comparison between Python API and Java/Scala API, I think the > majority of users, especially the beginner users, would not have this demand. > The priority of increasing user experience for beginner users seems higher > than it from my side. Would you please add more inputs for why user want to > compare? How much impact will the comparison be if we put it on multiple > pages :) > > Thanks for all of your feedback and suggestions, any follow-up feedback is > welcome. > > Best, > Jincheng > > > David Anderson <da...@alpinegizmo.com <mailto:da...@alpinegizmo.com>> > 于2020年8月3日周一 下午10:49写道: > Jincheng, > > One thing that I like about the way that the documentation is currently > organized is that it's relatively straightforward to compare the Python API > with the Java and Scala versions. I'm concerned that if the PyFlink docs are > more independent, it will be challenging to respond to questions about which > features from the other APIs are available from Python. > > David > > On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <sunjincheng...@gmail.com > <mailto:sunjincheng...@gmail.com>> wrote: > Would be great if you could join the contribution of PyFlink > documentation @Marta ! > Thanks for all of the positive feedback. I will start a formal vote then > later... > > Best, > Jincheng > > > Shuiqiang Chen <acqua....@gmail.com <mailto:acqua....@gmail.com>> > 于2020年8月3日周一 上午9:56写道: > > > Hi jincheng, > > > > Thanks for the discussion. +1 for the FLIP. > > > > A well-organized documentation will greatly improve the efficiency and > > experience for developers. > > > > Best, > > Shuiqiang > > > > Hequn Cheng <he...@apache.org <mailto:he...@apache.org>> 于2020年8月1日周六 > > 上午8:42写道: > > > >> Hi Jincheng, > >> > >> Thanks a lot for raising the discussion. +1 for the FLIP. > >> > >> I think this will bring big benefits for the PyFlink users. Currently, > >> the Python TableAPI document is hidden deeply under the TableAPI&SQL tab > >> which makes it quite unreadable. Also, the PyFlink documentation is mixed > >> with Java/Scala documentation. It is hard for users to have an overview of > >> all the PyFlink documents. As more and more functionalities are added into > >> PyFlink, I think it's time for us to refactor the document. > >> > >> Best, > >> Hequn > >> > >> > >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <ma...@ververica.com > >> <mailto:ma...@ververica.com>> > >> wrote: > >> > >>> Hi, Jincheng! > >>> > >>> Thanks for creating this detailed FLIP, it will make a big difference in > >>> the experience of Python developers using Flink. I'm interested in > >>> contributing to this work, so I'll reach out to you offline! > >>> > >>> Also, thanks for sharing some information on the adoption of PyFlink, > >>> it's > >>> great to see that there are already production users. > >>> > >>> Marta > >>> > >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <hxbks...@gmail.com > >>> <mailto:hxbks...@gmail.com>> wrote: > >>> > >>> > Hi Jincheng, > >>> > > >>> > Thanks a lot for bringing up this discussion and the proposal. > >>> > > >>> > Big +1 for improving the structure of PyFlink doc. > >>> > > >>> > It will be very friendly to give PyFlink users a unified entrance to > >>> learn > >>> > PyFlink documents. > >>> > > >>> > Best, > >>> > Xingbo > >>> > > >>> > Dian Fu <dian0511...@gmail.com <mailto:dian0511...@gmail.com>> > >>> > 于2020年7月31日周五 上午11:00写道: > >>> > > >>> >> Hi Jincheng, > >>> >> > >>> >> Thanks a lot for bringing up this discussion and the proposal. +1 to > >>> >> improve the Python API doc. > >>> >> > >>> >> I have received many feedbacks from PyFlink beginners about > >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is > >>> mixed > >>> >> with the Java doc and it's not easy to find the docs he wants to know. > >>> >> > >>> >> I think it would greatly improve the user experience if we can have > >>> one > >>> >> place which includes most knowledges PyFlink users should know. > >>> >> > >>> >> Regards, > >>> >> Dian > >>> >> > >>> >> 在 2020年7月31日,上午10:14,jincheng sun <sunjincheng...@gmail.com > >>> >> <mailto:sunjincheng...@gmail.com>> 写道: > >>> >> > >>> >> Hi folks, > >>> >> > >>> >> Since the release of Flink 1.11, users of PyFlink have continued to > >>> grow. > >>> >> As far as I know there are many companies have used PyFlink for data > >>> >> analysis, operation and maintenance monitoring business has been put > >>> into > >>> >> production(Such as 聚美优品[1](Jumei), 浙江墨芷[2] (Mozhi) etc.). According > >>> to > >>> >> the feedback we received, current documentation is not very friendly > >>> to > >>> >> PyFlink users. There are two shortcomings: > >>> >> > >>> >> - Python related content is mixed in the Java/Scala documentation, > >>> which > >>> >> makes it difficult for users who only focus on PyFlink to read. > >>> >> - There is already a "Python Table API" section in the Table API > >>> document > >>> >> to store PyFlink documents, but the number of articles is small and > >>> the > >>> >> content is fragmented. It is difficult for beginners to learn from it. > >>> >> > >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many > >>> >> documents will be added for those new APIs. In order to increase the > >>> >> readability and maintainability of the PyFlink document, Wei Zhong > >>> and me > >>> >> have discussed offline and would like to rework it via this FLIP. > >>> >> > >>> >> We will rework the document around the following three objectives: > >>> >> > >>> >> - Add a separate section for Python API under the "Application > >>> >> Development" section. > >>> >> - Restructure current Python documentation to a brand new structure to > >>> >> ensure complete content and friendly to beginners. > >>> >> - Improve the documents shared by Python/Java/Scala to make it more > >>> >> friendly to Python users and without affecting Java/Scala users. > >>> >> > >>> >> More detail can be found in the FLIP-133: > >>> >> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation > >>> > >>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation> > >>> >> > >>> >> Best, > >>> >> Jincheng > >>> >> > >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg > >>> >> <https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg> > >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g > >>> >> <https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g> > >>> >> > >>> >> > >>> >> > >>> > >>