Hi Minglei,

Spark ML provide a transformer named "OneHotEncoder" to map a column of
category indices to a column of binary vectors. It's similar with
pandas.get_dummies and OneHotEncoder of sklearn, but the output will be a
column of vector type rather than multiple columns.
You can refer the officially example
<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderExample.scala>
.

Yanbo

2015-12-17 16:00 GMT+08:00 zml张明磊 <mingleizh...@ctrip.com>:

> Hi ,
>
>
>
>          I am a new to scala and spark. Recently, I need to write a tool
> that transform category variables to dummy/indicator variables. I want to
> know are there some tools in scala and spark which support this
> transformation which like *pandas.get_dummies* in python ? Any example or
> study learning materials for me ?
>
>
>
> Thanks,
>
> Minglei.
>

Reply via email to