Re: [DISCUSS] Pythonic approach of setting Spark SQL configurations

Martin Grund Sun, 29 Dec 2024 07:42:16 -0800

Generally function and column composition can be quite verbose. Maybe this
is something to invest some brain power. I see to often folks fall back to
expr() or selectEpxr().


The other one I stumbled across was the idea of dynamic selectors like
Polara has them.

https://docs.pola.rs/api/python/stable/reference/selectors.html


On Sun, Dec 29, 2024 at 15:12 Mich Talebzadeh <[email protected]>
wrote:

> On your point
>
> ...I believe there are better ways to improve the pythonic surface of
> PySpark. ..
>
> Can you please elaborate?
>
> HTH
>
> Mich Talebzadeh,
>
> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
> London <https://en.wikipedia.org/wiki/Imperial_College_London>
> London, United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Sat, 28 Dec 2024 at 13:18, Martin Grund <[email protected]> wrote:
>
>> I'm not a fan of this approach. Spark configuration keys are defined as
>> string values in Spark and used as Strings everywhere.
>>
>> I don't necessarily see the benefit of changing
>>
>> conf["keyName"] vs conf.get("keyName") or even spark.conf.keyName
>>
>> Trying to wrap this into magic getattr calls is not ideal either. I
>> believe there are better ways to improve the pythonic surface of PySpark.
>>
>> What I do like is wrapping the return call of conf.get() with another
>> wrapper object to access the doc string. That's very neat.
>>
>> On Fri, Dec 27, 2024 at 3:07 PM Mich Talebzadeh <
>> [email protected]> wrote:
>>
>>> On the surface it looks like a good idea. In essence,writing code that
>>> is not just functional but also reflects the spirit and style of the
>>> Python language <https://peps.python.org/pep-0020/>. It is about
>>> writing code that is readable, and maintainable.
>>>
>>> The core objective (if I am correct) of this PR is to enhance the Python
>>> user experience when working with Spark configurations by introducing a
>>> more Pythonic, dictionary-like syntax. This approach will improve code
>>> readability and maintainability by providing a more intuitive and
>>> consistent way to set and access Spark configurations, aligning with
>>> Python's emphasis on clarity and expressiveness (as the above link).
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>>
>>> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>> London, United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>
>>>
>>> On Fri, 27 Dec 2024 at 07:23, Holden Karau <[email protected]>
>>> wrote:
>>>
>>>> I think having automatic gettr/settr on spark.conf object seems
>>>> reasonable to me.
>>>>
>>>> On Thu, Dec 26, 2024 at 9:32 PM Reynold Xin <[email protected]>
>>>> wrote:
>>>>
>>>>> I actually think this might be confusing (just in general adding too
>>>>> many different ways to do the same thing is also un-Pythonic).
>>>>>
>>>>> On Thu, Dec 26, 2024 at 4:58 PM Hyukjin Kwon <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I hope you guys are enjoying the holiday season. I just wanted to
>>>>>> have some quick feedback about this PR
>>>>>> https://github.com/apache/spark/pull/49297
>>>>>>
>>>>>> This PR allows you do set/unset SQL configurations in Pythonic way,
>>>>>> e.g.,
>>>>>>
>>>>>>  >>> 
>>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"]
>>>>>>  = "false" >>> 
>>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"]
>>>>>>  'false'
>>>>>>
>>>>>> as pandas also supports a similar way (
>>>>>> https://pandas.pydata.org/docs/user_guide/options.html)
>>>>>>
>>>>>> Any feedback on this approach would be appreciated.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> Pronouns: she/her
>>>>
>>>

Re: [DISCUSS] Pythonic approach of setting Spark SQL configurations

Reply via email to