You can always define an arbitrary RDD-to-RDD function, use it from both
Spark and Spark Streaming. For example,
def myTransofmration(rdd: RDD[X]): RDD[Y] = { .... }
In spark you can obvious apply it on an RDD. In spark streaming, you can
apply on the RDDs of a DStream by
myDStream.transform(rdd => myTransform(rdd))
I am not sure what you mean by reuse that transformation through Spark SQL.
Do you mean from a sql query? In Spark SQL you can register a function,
that operates on each records (so a map like function only), but not a
arbitrary transformation on tables. But then its easy to mix things up with
Spark and Spark SQL together, as you can do sqlContext.sql("sql query"),
get back the result RDDs, and then apply the myTransformation on that RDD.
Hope this clarifies things.
TD
On Fri, Aug 8, 2014 at 11:10 AM, Jeevak Kasarkod <[email protected]> wrote:
> Is it possible to create custom transformations in Spark? For example data
> security transforms such as encrypt and decrypt. Ideally its something one
> would like to reuse across Spark streaming, Spark SQL and Spark.
>
>