Re: [DISCUSS] Pythonic approach of setting Spark SQL configurations

Mich Talebzadeh Sun, 29 Dec 2024 06:12:53 -0800

On your point

...I believe there are better ways to improve the pythonic surface of
PySpark. ..


Can you please elaborate?

HTH

Mich Talebzadeh,

Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
London <https://en.wikipedia.org/wiki/Imperial_College_London>
London, United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Sat, 28 Dec 2024 at 13:18, Martin Grund <[email protected]> wrote:

> I'm not a fan of this approach. Spark configuration keys are defined as
> string values in Spark and used as Strings everywhere.
>
> I don't necessarily see the benefit of changing
>
> conf["keyName"] vs conf.get("keyName") or even spark.conf.keyName
>
> Trying to wrap this into magic getattr calls is not ideal either. I
> believe there are better ways to improve the pythonic surface of PySpark.
>
> What I do like is wrapping the return call of conf.get() with another
> wrapper object to access the doc string. That's very neat.
>
> On Fri, Dec 27, 2024 at 3:07 PM Mich Talebzadeh <[email protected]>
> wrote:
>
>> On the surface it looks like a good idea. In essence,writing code that is
>> not just functional but also reflects the spirit and style of the Python
>> language <https://peps.python.org/pep-0020/>. It is about writing code
>> that is readable, and maintainable.
>>
>> The core objective (if I am correct) of this PR is to enhance the Python
>> user experience when working with Spark configurations by introducing a
>> more Pythonic, dictionary-like syntax. This approach will improve code
>> readability and maintainability by providing a more intuitive and
>> consistent way to set and access Spark configurations, aligning with
>> Python's emphasis on clarity and expressiveness (as the above link).
>>
>> HTH
>>
>> Mich Talebzadeh,
>>
>> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist
>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>> London, United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Fri, 27 Dec 2024 at 07:23, Holden Karau <[email protected]>
>> wrote:
>>
>>> I think having automatic gettr/settr on spark.conf object seems
>>> reasonable to me.
>>>
>>> On Thu, Dec 26, 2024 at 9:32 PM Reynold Xin <[email protected]>
>>> wrote:
>>>
>>>> I actually think this might be confusing (just in general adding too
>>>> many different ways to do the same thing is also un-Pythonic).
>>>>
>>>> On Thu, Dec 26, 2024 at 4:58 PM Hyukjin Kwon <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I hope you guys are enjoying the holiday season. I just wanted to have
>>>>> some quick feedback about this PR
>>>>> https://github.com/apache/spark/pull/49297
>>>>>
>>>>> This PR allows you do set/unset SQL configurations in Pythonic way,
>>>>> e.g.,
>>>>>
>>>>>  >>> 
>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"]
>>>>>  = "false" >>> 
>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"]
>>>>>  'false'
>>>>>
>>>>> as pandas also supports a similar way (
>>>>> https://pandas.pydata.org/docs/user_guide/options.html)
>>>>>
>>>>> Any feedback on this approach would be appreciated.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>

Re: [DISCUSS] Pythonic approach of setting Spark SQL configurations

Reply via email to