Dear Spark gurus, *Question*: what way would you recommend to shape a library of custom transformations for Dataframes/Datasets?
*Details*: e.g., consider we need several custom transformations over the Dataset/Dataframe instances. For example injecting columns, apply specially tuned row filtering, lookup-table based replacements, etc. I'd consider basically 2 options: 1) implicits! create a class that looks like derived from Dataset/Dataframe and then and implement the transformations as its methods or 2) implement the transformations as stand-alone functions The use of first approach leads to such beautiful code: val result = inputDataframe .myAdvancedFiter(params) .myAdvancedReplacement(params) .myColumnInjection(params) .mySomethingElseTransformation(params) .andTheFinalGoodies(params) nice! whereas the second option will lead to this: val result = andTheFinalGoodies( mySomethingElseTransformation( myColumnInjection( myAdvancedReplacement( inputDataframe.myAdvancedFiter(params), params), params), params), params) terrible! ;) So, ideally it would be nice to learn how to implement Option 1. Luckily there are different approaches for this: https://stackoverflow.com/questions/32585670/what-is-the-best-way-to-define-custom-methods-on-a-dataframe However in reality such transformations rely on import spark.implicits._ and I never seen solution on how to pass SparkContext to such library classes and safely use it in there. This article shows, that it is not that straight-forward thing: https://docs.azuredatabricks.net/spark/latest/rdd-streaming/tips-for-running-streaming-apps-in-databricks.html Said that, I still need a wisdom of Spark community to get over this. P.S. and a good Spark application "boilerplate" with a separately implemented library of Dataframe/Dataset transformations relying on "import spark.implicits._" is still wanted badly! best regards -- Valery