That sounds good to have that especially given that it will allow more flexibility to the users. But I think that's slightly orthogonal to this proposal since this proposal is more about the default (before users take an action).
On Fri, 24 Feb 2023 at 15:35, Santosh Pingale <santosh.ping...@adyen.com> wrote: > Very interesting and user focused discussion, thanks for the proposal. > > Would it be better if we rather let users set the preference about the > language they want to see first in the code examples? This preference can > be easily stored on the browser side and used to decide ordering. This is > inline with freedom users have with spark today. > > > On Fri, Feb 24, 2023, 4:46 AM Allan Folting <afolting...@gmail.com> wrote: > >> I think this needs to be consistently done on all relevant pages and my >> intent is to do that work in time for when it is first released. >> I started with the "Spark SQL, DataFrames and Datasets Guide" page to >> break it up into multiple, scoped PRs. >> I should have made that clear before. >> >> I think it's a great idea to have an umbrella JIRA for this to outline >> the full scope and track overall progress and I'm happy to create it. >> >> I can't speak on behalf of all Scala users of course, but I don't think >> this change makes Scala appear as a 2nd class citizen, like I don't think >> of Python as a 2nd class citizen because it is not first currently, but it >> does recognize that Python is more broadly popular today. >> >> Thanks, >> Allan >> >> On Thu, Feb 23, 2023 at 6:55 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Thank you all. >>> >>> Yes, attracting more Python users and being more Python user-friendly is >>> always good. >>> >>> Basically, SPARK-42493 is proposing to introduce intentional >>> inconsistency to Apache Spark documentation. >>> >>> The inconsistency from SPARK-42493 might give Python users the following >>> questions first. >>> >>> - Why not RDD pages which are the heart of Apache Spark? Is Python not >>> good in RDD? >>> - Why not ML and Structured Streaming pages when DATA+AI Summit focuses >>> on ML heavily? >>> >>> Also, more questions to the Scala users. >>> - Is Scala language stepping down to the 2nd citizen language? >>> - What about Scala 3? >>> >>> Of course, I understand SPARK-42493 has specific scopes >>> (SQL/Dataset/Dataframe) and didn't mean anything like the above at all. >>> However, if SPARK-42493 is emphasized as "the first step" to introduce >>> that inconsistency, I'm wondering >>> - What direction we are heading? >>> - What is the next target scope? >>> - When it will be achieved (or completed)? >>> - Or, is the goal to be permanently inconsistent in terms of the >>> documentation? >>> >>> It's unclear even in the documentation-only scope. If we are expecting >>> more and more subtasks during Apache Spark 3.5 timeframe, shall we have an >>> umbrella JIRA? >>> >>> Bests, >>> Dongjoon. >>> >>> >>> On Thu, Feb 23, 2023 at 6:15 PM Allan Folting <afolting...@gmail.com> >>> wrote: >>> >>>> Thanks a lot for the questions and comments/feedback! >>>> >>>> To address your questions Dongjoon, I do not intend for these updates >>>> to the documentation to be tied to the potential changes/suggestions you >>>> ask about. >>>> >>>> In other words, this proposal is only about adjusting the documentation >>>> to target the majority of people reading it - namely the large and growing >>>> number of Python users - and new users in particular as they are often >>>> already familiar with and have a preference for Python when evaluating or >>>> starting to use Spark. >>>> >>>> While we may want to strengthen support for Python in other ways, I >>>> think such efforts should be tracked separately from this. >>>> >>>> Allan >>>> >>>> On Thu, Feb 23, 2023 at 1:44 AM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> If this is not just flip flopping the document pages and involves >>>>> other changes, then a proper impact analysis needs to be done to assess >>>>> the >>>>> efforts involved. Personally I don't think it really matters. >>>>> >>>>> HTH >>>>> >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, 23 Feb 2023 at 01:40, Hyukjin Kwon <gurwls...@gmail.com> >>>>> wrote: >>>>> >>>>>> > 1. Does this suggestion imply Python API implementation will be the >>>>>> new blocker in the future in terms of feature parity among languages? >>>>>> Until >>>>>> now, Python API feature parity was one of the audit items because it's >>>>>> not >>>>>> enforced. In other words, Scala and Java have been the full feature >>>>>> because >>>>>> they are the underlying main developer languages while Python/R/SQL >>>>>> environments were the nice-to-have. >>>>>> >>>>>> I think it wouldn't be treated as a blocker .. but I do believe we >>>>>> have added all new features into the Python side for the last couple of >>>>>> releases. So, I wouldn't worry about this at this moment - we have been >>>>>> doing fine in terms of feature parity. >>>>>> >>>>>> > 2. Does this suggestion assume that the Python environment is >>>>>> easier for users than Scala/Java always? Given that we support Python 3.8 >>>>>> to 3.11, the support matrix for Python library dependency is a problem >>>>>> for >>>>>> the Apache Spark community to solve in order to claim that. As we say >>>>>> at SPARK-41454, Python language also introduces breaking changes to us >>>>>> historically and we have many `Pinned` python libraries issues. >>>>>> >>>>>> Yes. In fact, regardless of this change, I do believe we should test >>>>>> more versions, etc. At least scheduled jobs like we're doing JDK and >>>>>> Scala >>>>>> versions. >>>>>> >>>>>> >>>>>> FWIW, my take about this change is: people use Python and PySpark >>>>>> more (according to the chart and stats provided) so let's put those >>>>>> examples first :-). >>>>>> >>>>>> >>>>>> On Thu, 23 Feb 2023 at 10:27, Dongjoon Hyun <dongjoon.h...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I have two questions to clarify the scope and boundaries. >>>>>>> >>>>>>> 1. Does this suggestion imply Python API implementation will be the >>>>>>> new blocker in the future in terms of feature parity among languages? >>>>>>> Until >>>>>>> now, Python API feature parity was one of the audit items because it's >>>>>>> not >>>>>>> enforced. In other words, Scala and Java have been the full feature >>>>>>> because >>>>>>> they are the underlying main developer languages while Python/R/SQL >>>>>>> environments were the nice-to-have. >>>>>>> >>>>>>> 2. Does this suggestion assume that the Python environment is easier >>>>>>> for users than Scala/Java always? Given that we support Python 3.8 to >>>>>>> 3.11, >>>>>>> the support matrix for Python library dependency is a problem for the >>>>>>> Apache Spark community to solve in order to claim that. As we say >>>>>>> at SPARK-41454, Python language also introduces breaking changes to us >>>>>>> historically and we have many `Pinned` python libraries issues. >>>>>>> >>>>>>> Changing documentation is easy, but I hope we can give clear >>>>>>> communication and direction in this effort because this is one of the >>>>>>> most >>>>>>> user-facing changes. >>>>>>> >>>>>>> Dongjoon. >>>>>>> >>>>>>> On Wed, Feb 22, 2023 at 5:26 PM 416161...@qq.com < >>>>>>> ruife...@foxmail.com> wrote: >>>>>>> >>>>>>>> +1 LGTM >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> Ruifeng Zheng >>>>>>>> ruife...@foxmail.com >>>>>>>> >>>>>>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------ Original ------------------ >>>>>>>> *From:* "Xinrong Meng" <xinrong.apa...@gmail.com>; >>>>>>>> *Date:* Thu, Feb 23, 2023 09:17 AM >>>>>>>> *To:* "Allan Folting"<afolting...@gmail.com>; >>>>>>>> *Cc:* "dev"<dev@spark.apache.org>; >>>>>>>> *Subject:* Re: [DISCUSS] Show Python code examples first in Spark >>>>>>>> documentation >>>>>>>> >>>>>>>> +1 Good idea! >>>>>>>> >>>>>>>> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson < >>>>>>>> jackagood...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Good idea, at the company I work at we discussed using Scala as >>>>>>>>> our primary language because technically it is slightly stronger than >>>>>>>>> python but ultimately chose python in the end as it’s easier for >>>>>>>>> other devs >>>>>>>>> to be on boarded to our platform and future hiring for the team etc >>>>>>>>> would >>>>>>>>> be easier >>>>>>>>> >>>>>>>>> On Thu, 23 Feb 2023 at 12:20 PM, Hyukjin Kwon <gurwls...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 I like this idea too. >>>>>>>>>> >>>>>>>>>> On Thu, Feb 23, 2023 at 6:00 AM Allan Folting < >>>>>>>>>> afolting...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I would like to propose that we show Python code examples first >>>>>>>>>>> in the Spark documentation where we have multiple programming >>>>>>>>>>> language >>>>>>>>>>> examples. >>>>>>>>>>> An example is on the Quick Start page: >>>>>>>>>>> https://spark.apache.org/docs/latest/quick-start.html >>>>>>>>>>> >>>>>>>>>>> I propose this change because Python has become more popular >>>>>>>>>>> than the other languages supported in Apache Spark. There are a lot >>>>>>>>>>> more >>>>>>>>>>> users of Spark in Python than Scala today and Python attracts a >>>>>>>>>>> broader set >>>>>>>>>>> of new users. >>>>>>>>>>> For Python usage data, see https://www.tiobe.com/tiobe-index/ >>>>>>>>>>> and >>>>>>>>>>> https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> Also, this change aligns with Python already being the first tab >>>>>>>>>>> on our home page: >>>>>>>>>>> https://spark.apache.org/ >>>>>>>>>>> >>>>>>>>>>> Anyone who wants to use another language can still just click on >>>>>>>>>>> the other tabs. >>>>>>>>>>> >>>>>>>>>>> I created a draft PR for the Spark SQL, DataFrames and Datasets >>>>>>>>>>> Guide page as a first step: >>>>>>>>>>> https://github.com/apache/spark/pull/40087 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I would appreciate it if you could share your thoughts on this >>>>>>>>>>> proposal. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks a lot, >>>>>>>>>>> Allan Folting >>>>>>>>>>> >>>>>>>>>>