Generally function and column composition can be quite verbose. Maybe this is something to invest some brain power. I see to often folks fall back to expr() or selectEpxr().
The other one I stumbled across was the idea of dynamic selectors like Polara has them. https://docs.pola.rs/api/python/stable/reference/selectors.html On Sun, Dec 29, 2024 at 15:12 Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > On your point > > ...I believe there are better ways to improve the pythonic surface of > PySpark. .. > > Can you please elaborate? > > HTH > > Mich Talebzadeh, > > Architect | Data Science | Financial Crime | GDPR & Compliance Specialist > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > London, United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Sat, 28 Dec 2024 at 13:18, Martin Grund <mar...@databricks.com> wrote: > >> I'm not a fan of this approach. Spark configuration keys are defined as >> string values in Spark and used as Strings everywhere. >> >> I don't necessarily see the benefit of changing >> >> conf["keyName"] vs conf.get("keyName") or even spark.conf.keyName >> >> Trying to wrap this into magic getattr calls is not ideal either. I >> believe there are better ways to improve the pythonic surface of PySpark. >> >> What I do like is wrapping the return call of conf.get() with another >> wrapper object to access the doc string. That's very neat. >> >> On Fri, Dec 27, 2024 at 3:07 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> On the surface it looks like a good idea. In essence,writing code that >>> is not just functional but also reflects the spirit and style of the >>> Python language <https://peps.python.org/pep-0020/>. It is about >>> writing code that is readable, and maintainable. >>> >>> The core objective (if I am correct) of this PR is to enhance the Python >>> user experience when working with Spark configurations by introducing a >>> more Pythonic, dictionary-like syntax. This approach will improve code >>> readability and maintainability by providing a more intuitive and >>> consistent way to set and access Spark configurations, aligning with >>> Python's emphasis on clarity and expressiveness (as the above link). >>> >>> HTH >>> >>> Mich Talebzadeh, >>> >>> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist >>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>> London, United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* The information provided is correct to the best of my >>> knowledge but of course cannot be guaranteed . It is essential to note >>> that, as with any advice, quote "one test result is worth one-thousand >>> expert opinions (Werner >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>> >>> >>> On Fri, 27 Dec 2024 at 07:23, Holden Karau <holden.ka...@gmail.com> >>> wrote: >>> >>>> I think having automatic gettr/settr on spark.conf object seems >>>> reasonable to me. >>>> >>>> On Thu, Dec 26, 2024 at 9:32 PM Reynold Xin <r...@databricks.com.invalid> >>>> wrote: >>>> >>>>> I actually think this might be confusing (just in general adding too >>>>> many different ways to do the same thing is also un-Pythonic). >>>>> >>>>> On Thu, Dec 26, 2024 at 4:58 PM Hyukjin Kwon <gurwls...@apache.org> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I hope you guys are enjoying the holiday season. I just wanted to >>>>>> have some quick feedback about this PR >>>>>> https://github.com/apache/spark/pull/49297 >>>>>> >>>>>> This PR allows you do set/unset SQL configurations in Pythonic way, >>>>>> e.g., >>>>>> >>>>>> >>> >>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"] >>>>>> = "false" >>> >>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"] >>>>>> 'false' >>>>>> >>>>>> as pandas also supports a similar way ( >>>>>> https://pandas.pydata.org/docs/user_guide/options.html) >>>>>> >>>>>> Any feedback on this approach would be appreciated. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> Pronouns: she/her >>>> >>>