I believe explicit is better than implicits, however as you mentioned the notation is very nice.
Therefore, I suggest https://medium.com/@mrpowers/chaining-custom-dataframe-transformations-in-spark-a39e315f903c to use df.transform(myFunction) Valery Khamenya <khame...@gmail.com> schrieb am Mo., 18. Juni 2018 um 21:34 Uhr: > Dear Spark gurus, > > *Question*: what way would you recommend to shape a library of custom > transformations for Dataframes/Datasets? > > *Details*: e.g., consider we need several custom transformations over the > Dataset/Dataframe instances. For example injecting columns, apply specially > tuned row filtering, lookup-table based replacements, etc. > > I'd consider basically 2 options: > > 1) implicits! create a class that looks like derived from > Dataset/Dataframe and then and implement the transformations as its methods > > or > > 2) implement the transformations as stand-alone functions > > The use of first approach leads to such beautiful code: > > val result = inputDataframe > .myAdvancedFiter(params) > .myAdvancedReplacement(params) > .myColumnInjection(params) > .mySomethingElseTransformation(params) > .andTheFinalGoodies(params) > > nice! > > whereas the second option will lead to this: > > val result = andTheFinalGoodies( > mySomethingElseTransformation( > myColumnInjection( > myAdvancedReplacement( > inputDataframe.myAdvancedFiter(params), > params), > params), > params), > params) > > terrible! ;) > > So, ideally it would be nice to learn how to implement Option 1. Luckily > there are different approaches for this: > https://stackoverflow.com/questions/32585670/what-is-the-best-way-to-define-custom-methods-on-a-dataframe > > However in reality such transformations rely on > > import spark.implicits._ > > and I never seen solution on how to pass SparkContext to such library > classes and safely use it in there. This article shows, that it is not that > straight-forward thing: > > > https://docs.azuredatabricks.net/spark/latest/rdd-streaming/tips-for-running-streaming-apps-in-databricks.html > > Said that, I still need a wisdom of Spark community to get over this. > > P.S. and a good Spark application "boilerplate" with a separately > implemented library of Dataframe/Dataset transformations relying on "import > spark.implicits._" is still wanted badly! > > best regards > -- > Valery >