You can have a list of all the columns and pass it to a recursive recursive
function to fit and make the transformation.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-a-Spark-Equivalent-for-Pandas-get-dummies-tp28064p28079.html
Sent from the Apac
I did get *some* help from DataBricks in terms of programmatically grabbing
the categorical variables but I can't figure out where to go from here:
*# Get all string cols/categorical cols*
*stringColList = [i[0] for i in df.dtypes if i[1] == 'string']*
*# generate OHEs for every col in stringColL
For now OHE supports a single column. So you have to have 1000 OHE in a
pipeline. However you can add them programatically so it is not too bad. If
the cardinality of each feature is quite low, it should be workable.
After that user VectorAssembler to stitch the vectors together (which
accepts mul