Hi Minglei, Spark ML provide a transformer named "OneHotEncoder" to map a column of category indices to a column of binary vectors. It's similar with pandas.get_dummies and OneHotEncoder of sklearn, but the output will be a column of vector type rather than multiple columns. You can refer the officially example <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderExample.scala> .
Yanbo 2015-12-17 16:00 GMT+08:00 zml张明磊 <mingleizh...@ctrip.com>: > Hi , > > > > I am a new to scala and spark. Recently, I need to write a tool > that transform category variables to dummy/indicator variables. I want to > know are there some tools in scala and spark which support this > transformation which like *pandas.get_dummies* in python ? Any example or > study learning materials for me ? > > > > Thanks, > > Minglei. >