Re: Embedding your own transformer in Spark.ml Pipleline

Ram Sriharsha Tue, 02 Jun 2015 10:44:37 -0700

Hi

We are in the process of adding examples for feature transformations (
https://issues.apache.org/jira/browse/SPARK-7546) and this should be
available shortly on Spark Master.
In the meanwhile, the best place to start would be to look at how the
Tokenizer works here:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala


You need to implement the Transformer interface as above. In this case a
UnaryTransformer since the feature transformer acts on one column,
transforms it and outputs another column.

and an example of how to build a pipeline that includes a feature
transformer (the HashingTF is the feature transformer analogous to what you
would build):
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/SimpleTextClassificationPipeline.scala

but stay tuned, we should have examples in Python, Scala and Java soon

Ram

On Tue, Jun 2, 2015 at 10:19 AM, dimple <[email protected]> wrote:

> Hi,
> I would like to embed my own transformer in the Spark.ml Pipleline but do
> not see an example of it. Can someone share an example of which
> classes/interfaces I need to extend/implement in order to do so. Thanks.
>
> Dimple
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-your-own-transformer-in-Spark-ml-Pipleline-tp23112.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Embedding your own transformer in Spark.ml Pipleline

Reply via email to