Re: [DISCUSS] Show Python code examples first in Spark documentation

Santosh Pingale Thu, 23 Feb 2023 23:31:58 -0800

Yes, I definitely agree and +1 to the proposal (FWIW).

I was looking at Dongjoon's comments which made a lot of sense to me and
trying to come up with an approach that provides smooth segway to python as
first tab later on. But this is mostly guess work as I do not personally
know the actual user behaviour on docs site.


On Fri, Feb 24, 2023, 8:01 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:

> That sounds good to have that especially given that it will allow more
> flexibility to the users.
> But I think that's slightly orthogonal to this proposal since this
> proposal is more about the default (before users take an action).
>
>
> On Fri, 24 Feb 2023 at 15:35, Santosh Pingale <santosh.ping...@adyen.com>
> wrote:
>
>> Very interesting and user focused discussion, thanks for the proposal.
>>
>> Would it be better if we rather let users set the preference about the
>> language they want to see first in the code examples? This preference can
>> be easily stored on the browser side and used to decide ordering. This is
>> inline with freedom users have with spark today.
>>
>>
>> On Fri, Feb 24, 2023, 4:46 AM Allan Folting <afolting...@gmail.com>
>> wrote:
>>
>>> I think this needs to be consistently done on all relevant pages and my
>>> intent is to do that work in time for when it is first released.
>>> I started with the "Spark SQL, DataFrames and Datasets Guide" page to
>>> break it up into multiple, scoped PRs.
>>> I should have made that clear before.
>>>
>>> I think it's a great idea to have an umbrella JIRA for this to outline
>>> the full scope and track overall progress and I'm happy to create it.
>>>
>>> I can't speak on behalf of all Scala users of course, but I don't think
>>> this change makes Scala appear as a 2nd class citizen, like I don't think
>>> of Python as a 2nd class citizen because it is not first currently, but it
>>> does recognize that Python is more broadly popular today.
>>>
>>> Thanks,
>>> Allan
>>>
>>> On Thu, Feb 23, 2023 at 6:55 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> Thank you all.
>>>>
>>>> Yes, attracting more Python users and being more Python user-friendly
>>>> is always good.
>>>>
>>>> Basically, SPARK-42493 is proposing to introduce intentional
>>>> inconsistency to Apache Spark documentation.
>>>>
>>>> The inconsistency from SPARK-42493 might give Python users the
>>>> following questions first.
>>>>
>>>> - Why not RDD pages which are the heart of Apache Spark? Is Python not
>>>> good in RDD?
>>>> - Why not ML and Structured Streaming pages when DATA+AI Summit focuses
>>>> on ML heavily?
>>>>
>>>> Also, more questions to the Scala users.
>>>> - Is Scala language stepping down to the 2nd citizen language?
>>>> - What about Scala 3?
>>>>
>>>> Of course, I understand SPARK-42493 has specific scopes
>>>> (SQL/Dataset/Dataframe) and didn't mean anything like the above at all.
>>>> However, if SPARK-42493 is emphasized as "the first step" to introduce
>>>> that inconsistency, I'm wondering
>>>> - What direction we are heading?
>>>> - What is the next target scope?
>>>> - When it will be achieved (or completed)?
>>>> - Or, is the goal to be permanently inconsistent in terms of the
>>>> documentation?
>>>>
>>>> It's unclear even in the documentation-only scope. If we are expecting
>>>> more and more subtasks during Apache Spark 3.5 timeframe, shall we have an
>>>> umbrella JIRA?
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Thu, Feb 23, 2023 at 6:15 PM Allan Folting <afolting...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks a lot for the questions and comments/feedback!
>>>>>
>>>>> To address your questions Dongjoon, I do not intend for these updates
>>>>> to the documentation to be tied to the potential changes/suggestions you
>>>>> ask about.
>>>>>
>>>>> In other words, this proposal is only about adjusting the
>>>>> documentation to target the majority of people reading it - namely the
>>>>> large and growing number of Python users - and new users in particular as
>>>>> they are often already familiar with and have a preference for Python when
>>>>> evaluating or starting to use Spark.
>>>>>
>>>>> While we may want to strengthen support for Python in other ways, I
>>>>> think such efforts should be tracked separately from this.
>>>>>
>>>>> Allan
>>>>>
>>>>> On Thu, Feb 23, 2023 at 1:44 AM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> If this is not just flip flopping the document pages and involves
>>>>>> other changes, then a proper impact analysis needs to be done to assess 
>>>>>> the
>>>>>> efforts involved. Personally I don't think it really matters.
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, 23 Feb 2023 at 01:40, Hyukjin Kwon <gurwls...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > 1. Does this suggestion imply Python API implementation will be
>>>>>>> the new blocker in the future in terms of feature parity among 
>>>>>>> languages?
>>>>>>> Until now, Python API feature parity was one of the audit items because
>>>>>>> it's not enforced. In other words, Scala and Java have been the full
>>>>>>> feature because they are the underlying main developer languages while
>>>>>>> Python/R/SQL environments were the nice-to-have.
>>>>>>>
>>>>>>> I think it wouldn't be treated as a blocker .. but I do believe we
>>>>>>> have added all new features into the Python side for the last couple of
>>>>>>> releases. So, I wouldn't worry about this at this moment - we have been
>>>>>>> doing fine in terms of feature parity.
>>>>>>>
>>>>>>> > 2. Does this suggestion assume that the Python environment is
>>>>>>> easier for users than Scala/Java always? Given that we support Python 
>>>>>>> 3.8
>>>>>>> to 3.11, the support matrix for Python library dependency is a problem 
>>>>>>> for
>>>>>>> the Apache Spark community to solve in order to claim that. As we say
>>>>>>> at SPARK-41454, Python language also introduces breaking changes to us
>>>>>>> historically and we have many `Pinned` python libraries issues.
>>>>>>>
>>>>>>> Yes. In fact, regardless of this change, I do believe we should test
>>>>>>> more versions, etc. At least scheduled jobs like we're doing JDK and 
>>>>>>> Scala
>>>>>>> versions.
>>>>>>>
>>>>>>>
>>>>>>> FWIW, my take about this change is: people use Python and PySpark
>>>>>>> more (according to the chart and stats provided) so let's put those
>>>>>>> examples first :-).
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 23 Feb 2023 at 10:27, Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have two questions to clarify the scope and boundaries.
>>>>>>>>
>>>>>>>> 1. Does this suggestion imply Python API implementation will be the
>>>>>>>> new blocker in the future in terms of feature parity among languages? 
>>>>>>>> Until
>>>>>>>> now, Python API feature parity was one of the audit items because it's 
>>>>>>>> not
>>>>>>>> enforced. In other words, Scala and Java have been the full feature 
>>>>>>>> because
>>>>>>>> they are the underlying main developer languages while Python/R/SQL
>>>>>>>> environments were the nice-to-have.
>>>>>>>>
>>>>>>>> 2. Does this suggestion assume that the Python environment is
>>>>>>>> easier for users than Scala/Java always? Given that we support Python 
>>>>>>>> 3.8
>>>>>>>> to 3.11, the support matrix for Python library dependency is a problem 
>>>>>>>> for
>>>>>>>> the Apache Spark community to solve in order to claim that. As we say
>>>>>>>> at SPARK-41454, Python language also introduces breaking changes to us
>>>>>>>> historically and we have many `Pinned` python libraries issues.
>>>>>>>>
>>>>>>>> Changing documentation is easy, but I hope we can give clear
>>>>>>>> communication and direction in this effort because this is one of the 
>>>>>>>> most
>>>>>>>> user-facing changes.
>>>>>>>>
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>> On Wed, Feb 22, 2023 at 5:26 PM 416161...@qq.com <
>>>>>>>> ruife...@foxmail.com> wrote:
>>>>>>>>
>>>>>>>>> +1 LGTM
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> Ruifeng Zheng
>>>>>>>>> ruife...@foxmail.com
>>>>>>>>>
>>>>>>>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------ Original ------------------
>>>>>>>>> *From:* "Xinrong Meng" <xinrong.apa...@gmail.com>;
>>>>>>>>> *Date:* Thu, Feb 23, 2023 09:17 AM
>>>>>>>>> *To:* "Allan Folting"<afolting...@gmail.com>;
>>>>>>>>> *Cc:* "dev"<dev@spark.apache.org>;
>>>>>>>>> *Subject:* Re: [DISCUSS] Show Python code examples first in Spark
>>>>>>>>> documentation
>>>>>>>>>
>>>>>>>>> +1 Good idea!
>>>>>>>>>
>>>>>>>>> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson <
>>>>>>>>> jackagood...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Good idea, at the company I work at we discussed using Scala as
>>>>>>>>>> our primary language because technically it is slightly stronger than
>>>>>>>>>> python but ultimately chose python in the end as it’s easier for 
>>>>>>>>>> other devs
>>>>>>>>>> to be on boarded to our platform and future hiring for the team etc 
>>>>>>>>>> would
>>>>>>>>>> be easier
>>>>>>>>>>
>>>>>>>>>> On Thu, 23 Feb 2023 at 12:20 PM, Hyukjin Kwon <
>>>>>>>>>> gurwls...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 I like this idea too.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 23, 2023 at 6:00 AM Allan Folting <
>>>>>>>>>>> afolting...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to propose that we show Python code examples first
>>>>>>>>>>>> in the Spark documentation where we have multiple programming 
>>>>>>>>>>>> language
>>>>>>>>>>>> examples.
>>>>>>>>>>>> An example is on the Quick Start page:
>>>>>>>>>>>> https://spark.apache.org/docs/latest/quick-start.html
>>>>>>>>>>>>
>>>>>>>>>>>> I propose this change because Python has become more popular
>>>>>>>>>>>> than the other languages supported in Apache Spark. There are a 
>>>>>>>>>>>> lot more
>>>>>>>>>>>> users of Spark in Python than Scala today and Python attracts a 
>>>>>>>>>>>> broader set
>>>>>>>>>>>> of new users.
>>>>>>>>>>>> For Python usage data, see https://www.tiobe.com/tiobe-index/
>>>>>>>>>>>>  and
>>>>>>>>>>>> https://insights.stackoverflow.com/trends?tags=r%2Cscala%2Cpython%2Cjava
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> Also, this change aligns with Python already being the first
>>>>>>>>>>>> tab on our home page:
>>>>>>>>>>>> https://spark.apache.org/
>>>>>>>>>>>>
>>>>>>>>>>>> Anyone who wants to use another language can still just click
>>>>>>>>>>>> on the other tabs.
>>>>>>>>>>>>
>>>>>>>>>>>> I created a draft PR for the Spark SQL, DataFrames and Datasets
>>>>>>>>>>>> Guide page as a first step:
>>>>>>>>>>>> https://github.com/apache/spark/pull/40087
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I would appreciate it if you could share your thoughts on this
>>>>>>>>>>>> proposal.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>>> Allan Folting
>>>>>>>>>>>>
>>>>>>>>>>>

Re: [DISCUSS] Show Python code examples first in Spark documentation

Reply via email to