Hi All, There are several categorical columns in my dataset as follows: [image: Inline images 1]
How can I transform values in each (categorical) columns into numeric using StringIndexer so that the resulting DataFrame can be feed into VectorAssembler to generate a feature vector? A naive approach that I can try using StringIndexer for each categorical column. But that sounds hilarious, I know. A possible workaround <https://stackoverflow.com/questions/36942233/apply-stringindexer-to-several-columns-in-a-pyspark-dataframe>in PySpark is combining several StringIndexer on a list and use a Pipeline to execute them all as follows: from pyspark.ml import Pipelinefrom pyspark.ml.feature import StringIndexer indexers = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(df) for column in list(set(df.columns)-set(['date'])) ] pipeline = Pipeline(stages=indexers) df_r = pipeline.fit(df).transform(df) df_r.show() How I can do the same in Scala? I tried the following: val featureCol = trainingDF.columns var indexers: Array[StringIndexer] = null for (colName <- featureCol) { val index = new StringIndexer() .setInputCol(colName) .setOutputCol(colName + "_indexed") //.fit(trainDF) indexers = indexers :+ index } val pipeline = new Pipeline() .setStages(indexers) val newDF = pipeline.fit(trainingDF).transform(trainingDF) newDF.show() However, I am experiencing NullPointerException at for (colName <- featureCol) I am sure, I am doing something wrong. Any suggestion? Regards, _________________________________ *Md. Rezaul Karim*, BSc, MSc Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html>