Re: Spark ML : One hot Encoding for multiple columns

janardhan shetty Sun, 13 Nov 2016 16:55:33 -0800

These Jiras'  are still unresolved:
https://issues.apache.org/jira/browse/SPARK-11215


Also there is https://issues.apache.org/jira/browse/SPARK-8418

On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar <ni...@cloudera.com> wrote:

>
> The OneHotEncoder does *not* accept multiple columns.
>
> You can use Michal's suggestion where he uses Pipeline to set the stages
> and then executes them.
>
> The other option is to write a function that performs one hot encoding on
> a column and returns a dataframe with the encoded column and then call it
> multiple times for the rest of the columns.
>
>
>
>
> On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty <janardhan...@gmail.com
> > wrote:
>
>> I had already tried this way :
>>
>> scala> val featureCols = Array("category","newone")
>> featureCols: Array[String] = Array(category, newone)
>>
>> scala>  val indexer = new StringIndexer().setInputCol(fe
>> atureCols).setOutputCol("categoryIndex").fit(df1)
>> <console>:29: error: type mismatch;
>>  found   : Array[String]
>>  required: String
>>         val indexer = new StringIndexer().setInputCol(fe
>> atureCols).setOutputCol("categoryIndex").fit(df1)
>>
>>
>> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar <ni...@cloudera.com>
>> wrote:
>>
>>> I don't think it does. From the documentation:
>>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html
>>> #onehotencoder, I see that it still accepts one column at a time.
>>>
>>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <
>>> janardhan...@gmail.com> wrote:
>>>
>>>> 2.0:
>>>>
>>>> One hot encoding currently accepts single input column is there a way
>>>> to include multiple columns ?
>>>>
>>>
>>>
>>
>

Re: Spark ML : One hot Encoding for multiple columns

Reply via email to