Re: Why can't a Transformer have multiple output columns?

Nick Pentreath Tue, 23 Aug 2016 07:42:01 -0700

It's not impossible that a Transformer could output multiple columns - it's
simply because none of the current ones do. It's true that it might be a
relatively less common use case in general.

But take StringIndexer for example. It turns strings (categorical features)
into ints (0-based indexes). It could (should) accept multiple input
columns for efficiency (see
https://issues.apache.org/jira/browse/SPARK-11215). This is a case where
multiple output columns would be required.

N

On Tue, 23 Aug 2016 at 16:15 Nicholas Chammas <[email protected]>
wrote:

> If you create your own Spark 2.x ML Transformer, there are multiple
> mix-ins (is that the correct term?) that you can use to define its behavior
> which are in ml/param/shared.py
> <https://github.com/apache/spark/blob/master/python/pyspark/ml/param/shared.py>
> .
>
> Among them are the following mix-ins:
>
>    - HasInputCol
>    - HasInputCols
>    - HasOutputCol
>
> What’s *not* available is a HasOutputCols mix-in, and I assume that is
> intentional.
>
> Is there a design reason why Transformers should not be able to define
> multiple output columns?
>
> I’m guessing if you are an ML beginner who thinks they need a Transformer
> with multiple output columns, you’ve misunderstood something. 😅
>
> Nick
> 
>

Re: Why can't a Transformer have multiple output columns?

Reply via email to