On top proper deep integration with pytest would be awesome. On Sun, Dec 29, 2024 at 16:40 Martin Grund <mar...@databricks.com> wrote:
> Generally function and column composition can be quite verbose. Maybe this > is something to invest some brain power. I see to often folks fall back to > expr() or selectEpxr(). > > The other one I stumbled across was the idea of dynamic selectors like > Polara has them. > > https://docs.pola.rs/api/python/stable/reference/selectors.html > > > On Sun, Dec 29, 2024 at 15:12 Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> On your point >> >> ...I believe there are better ways to improve the pythonic surface of >> PySpark. .. >> >> Can you please elaborate? >> >> HTH >> >> Mich Talebzadeh, >> >> Architect | Data Science | Financial Crime | GDPR & Compliance Specialist >> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >> London, United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* The information provided is correct to the best of my >> knowledge but of course cannot be guaranteed . It is essential to note >> that, as with any advice, quote "one test result is worth one-thousand >> expert opinions (Werner >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >> >> >> On Sat, 28 Dec 2024 at 13:18, Martin Grund <mar...@databricks.com> wrote: >> >>> I'm not a fan of this approach. Spark configuration keys are defined as >>> string values in Spark and used as Strings everywhere. >>> >>> I don't necessarily see the benefit of changing >>> >>> conf["keyName"] vs conf.get("keyName") or even spark.conf.keyName >>> >>> Trying to wrap this into magic getattr calls is not ideal either. I >>> believe there are better ways to improve the pythonic surface of PySpark. >>> >>> What I do like is wrapping the return call of conf.get() with another >>> wrapper object to access the doc string. That's very neat. >>> >>> On Fri, Dec 27, 2024 at 3:07 PM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> On the surface it looks like a good idea. In essence,writing code that >>>> is not just functional but also reflects the spirit and style of the >>>> Python language <https://peps.python.org/pep-0020/>. It is about >>>> writing code that is readable, and maintainable. >>>> >>>> The core objective (if I am correct) of this PR is to enhance the >>>> Python user experience when working with Spark configurations by >>>> introducing a more Pythonic, dictionary-like syntax. This approach will >>>> improve code readability and maintainability by providing a more intuitive >>>> and consistent way to set and access Spark configurations, aligning with >>>> Python's emphasis on clarity and expressiveness (as the above link). >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> >>>> Architect | Data Science | Financial Crime | GDPR & Compliance >>>> Specialist >>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>>> London, United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* The information provided is correct to the best of my >>>> knowledge but of course cannot be guaranteed . It is essential to note >>>> that, as with any advice, quote "one test result is worth one-thousand >>>> expert opinions (Werner >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>> >>>> >>>> On Fri, 27 Dec 2024 at 07:23, Holden Karau <holden.ka...@gmail.com> >>>> wrote: >>>> >>>>> I think having automatic gettr/settr on spark.conf object seems >>>>> reasonable to me. >>>>> >>>>> On Thu, Dec 26, 2024 at 9:32 PM Reynold Xin >>>>> <r...@databricks.com.invalid> wrote: >>>>> >>>>>> I actually think this might be confusing (just in general adding too >>>>>> many different ways to do the same thing is also un-Pythonic). >>>>>> >>>>>> On Thu, Dec 26, 2024 at 4:58 PM Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I hope you guys are enjoying the holiday season. I just wanted to >>>>>>> have some quick feedback about this PR >>>>>>> https://github.com/apache/spark/pull/49297 >>>>>>> >>>>>>> This PR allows you do set/unset SQL configurations in Pythonic way, >>>>>>> e.g., >>>>>>> >>>>>>> >>> >>>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"] >>>>>>> = "false" >>> >>>>>>> spark.conf["spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled"] >>>>>>> 'false' >>>>>>> >>>>>>> as pandas also supports a similar way ( >>>>>>> https://pandas.pydata.org/docs/user_guide/options.html) >>>>>>> >>>>>>> Any feedback on this approach would be appreciated. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> Pronouns: she/her >>>>> >>>>