Re: Why can't a Transformer have multiple output columns?

Nicholas Chammas Tue, 23 Aug 2016 07:49:14 -0700

Thanks for the pointer! A linked issue from the one you shared also appears
to be relevant.


SPARK-8418 <https://issues.apache.org/jira/browse/SPARK-8418>: "Add single-
and multi-value support to ML Transformers"

On Tue, Aug 23, 2016 at 10:41 AM Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> It's not impossible that a Transformer could output multiple columns -
> it's simply because none of the current ones do. It's true that it might be
> a relatively less common use case in general.
>
> But take StringIndexer for example. It turns strings (categorical
> features) into ints (0-based indexes). It could (should) accept multiple
> input columns for efficiency (see
> https://issues.apache.org/jira/browse/SPARK-11215). This is a case where
> multiple output columns would be required.
>
> N
>
>
> On Tue, 23 Aug 2016 at 16:15 Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
>> If you create your own Spark 2.x ML Transformer, there are multiple
>> mix-ins (is that the correct term?) that you can use to define its behavior
>> which are in ml/param/shared.py
>> <https://github.com/apache/spark/blob/master/python/pyspark/ml/param/shared.py>
>> .
>>
>> Among them are the following mix-ins:
>>
>>    - HasInputCol
>>    - HasInputCols
>>    - HasOutputCol
>>
>> What’s *not* available is a HasOutputCols mix-in, and I assume that is
>> intentional.
>>
>> Is there a design reason why Transformers should not be able to define
>> multiple output columns?
>>
>> I’m guessing if you are an ML beginner who thinks they need a Transformer
>> with multiple output columns, you’ve misunderstood something. 😅
>>
>> Nick
>> 
>>
>

Re: Why can't a Transformer have multiple output columns?

Reply via email to