Hi

We are in the process of adding examples for feature transformations (
https://issues.apache.org/jira/browse/SPARK-7546) and this should be
available shortly on Spark Master.
In the meanwhile, the best place to start would be to look at how the
Tokenizer works here:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

You need to implement the Transformer interface as above. In this case a
UnaryTransformer since the feature transformer acts on one column,
transforms it and outputs another column.

and an example of how to build a pipeline that includes a feature
transformer (the HashingTF is the feature transformer analogous to what you
would build):
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala

but stay tuned, we should have examples in Python, Scala and Java soon

Ram

On Tue, Jun 2, 2015 at 10:19 AM, dimple <dimp201...@gmail.com> wrote:

> Hi,
> I would like to embed my own transformer in the Spark.ml Pipleline but do
> not see an example of it. Can someone share an example of which
> classes/interfaces I need to extend/implement in order to do so. Thanks.
>
> Dimple
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to