Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-15 Thread neil90
You can have a list of all the columns and pass it to a recursive recursive function to fit and make the transformation. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-a-Spark-Equivalent-for-Pandas-get-dummies-tp28064p28079.html Sent from the Apac

Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
I did get *some* help from DataBricks in terms of programmatically grabbing the categorical variables but I can't figure out where to go from here: *# Get all string cols/categorical cols* *stringColList = [i[0] for i in df.dtypes if i[1] == 'string']* *# generate OHEs for every col in stringColL

Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nick Pentreath
For now OHE supports a single column. So you have to have 1000 OHE in a pipeline. However you can add them programatically so it is not too bad. If the cardinality of each feature is quite low, it should be workable. After that user VectorAssembler to stitch the vectors together (which accepts mul