Hi We are in the process of adding examples for feature transformations ( https://issues.apache.org/jira/browse/SPARK-7546) and this should be available shortly on Spark Master. In the meanwhile, the best place to start would be to look at how the Tokenizer works here: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
You need to implement the Transformer interface as above. In this case a UnaryTransformer since the feature transformer acts on one column, transforms it and outputs another column. and an example of how to build a pipeline that includes a feature transformer (the HashingTF is the feature transformer analogous to what you would build): https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala but stay tuned, we should have examples in Python, Scala and Java soon Ram On Tue, Jun 2, 2015 at 10:19 AM, dimple <dimp201...@gmail.com> wrote: > Hi, > I would like to embed my own transformer in the Spark.ml Pipleline but do > not see an example of it. Can someone share an example of which > classes/interfaces I need to extend/implement in order to do so. Thanks. > > Dimple > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >